Batched Inference

1 paper across 1 session

Poster Session 5

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity

#3513 · Susav Shrestha, Bradley Settlemyer, Nikoli Dryden, Narasimha Reddy

Polar Sparsity scales contextual sparsity to large batches by exploiting stable attention head sparsity and using efficient GPU kernels, achieving up to 2.2× speedups with minimal accuracy loss.