Bradley Settlemyer

Researcher, NVIDIA

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity

#3513 · Susav Shrestha, Bradley Settlemyer, Nikoli Dryden, A. L. Narasimha Reddy

Polar Sparsity scales contextual sparsity to large batches by exploiting stable attention head sparsity and using efficient GPU kernels, achieving up to 2.2× speedups with minimal accuracy loss.