Kurt Keutzer

Full Professor, University of California Berkeley

5 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

#110 Spotlight · Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

A dataset of multi-agent system traces, and a systematic analysis of failures in multi-agent LLM systems, featuring a structured taxonomy and an automated evaluation pipeline.

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Radial Attention:

O (n lo g n)

Sparse Attention for Long Video Generation

#5414 · Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

a sparse attention with $\mathcal O(n \log n)$ complexity for long video generation

Poster Session 6

3 papers

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

#3508 Spotlight · Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica

We propose a method to speedup video diffusion generation through efficient attention.

Angles Don’t Lie: Unlocking Training‑Efficient RL Through the Model’s Own Signals

#310 Spotlight · Qinsi Wang, Jinghan Ke, Hancheng Ye, Yueqian Lin, Yuzhe Fu, Jianyi Zhang, Kurt Keutzer, Chenfeng Xu, Yiran Chen

We present a model-aware approach that leverages the model’s own signals to dynamically choose training data, markedly boosting both training and data efficiency in RL fine-tuning.

Multipole Attention for Efficient Long Context Reasoning

#3518 · Coleman Richard Charles Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Sophia Shao, Kurt Keutzer, Amir Gholami

Accelerating attention for long-context reasoning by identifying and loading important tokens and by approximating attention to less important tokens