Full Professor, Texas A&M University - College Station
1 paper at NeurIPS 2025
Polar Sparsity scales contextual sparsity to large batches by exploiting stable attention head sparsity and using efficient GPU kernels, achieving up to 2.2× speedups with minimal accuracy loss.