Researcher, Alibaba Group
2 papers at NeurIPS 2025
We propose LongBioBench for controllable evaluation on Long-Context Language Models
We find applying a query-dependent head-specific sigmoid gate after the Scaled Dot-Product Attention (SDPA) consistently improves performance, improves scaling properties and mitigates the `massive activation' and `attention sink'.