1 paper across 1 session
Polar Sparsity scales contextual sparsity to large batches by exploiting stable attention head sparsity and using efficient GPU kernels, achieving up to 2.2× speedups with minimal accuracy loss.