Student, Korea Advanced Institute of Science and Technology
1 paper at NeurIPS 2025
We identify a problem with sparse attention inference and propose a simple solution. Our solution increases performance by a large margin while maintaining low latency.