Songlin Yang

PhD student, Massachusetts Institute of Technology

3 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

PaTH Attention: Position Encoding via Accumulating Householder Transformations

#3505 · Songlin Yang, Yikang Shen, Kaiyue Wen, Shawn Tan, Mayank Mishra, Liliang Ren, Rameswar Panda, Yoon Kim

We propose a contextualized position encoding using dynamic Householder matrices in place of static rotary ones, along with a hardware-efficient training algorithm that improves state tracking performance.

Poster Session 4

2 papers

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Radial Attention:

O (n lo g n)

Sparse Attention for Long Video Generation

#5414 · Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

a sparse attention with $\mathcal O(n \log n)$ complexity for long video generation

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

#5202 · Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

We find applying a query-dependent head-specific sigmoid gate after the Scaled Dot-Product Attention (SDPA) consistently improves performance, improves scaling properties and mitigates the `massive activation' and `attention sink'.