Han Cai

Researcher, NVIDIA

3 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

1 paper

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

#3517 · Yuxian Gu, Qinghao Hu, Haocheng Xi, Junyu Chen, Shang Yang, Song Han, Han Cai

We present JetLM, a new family of LMs, which matches leading full-attention models while significantly improving generation throughput.

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

#303 Spotlight · Hao Kang, Qingru Zhang, Han Cai, Weiyuan Xu, Tushar Krishna, Yilun Du, Tsachy Weissman

We present the first systematic study of lossy latency–quality trade-offs in LLM agents, introducing HFTBench and StreetFighter benchmarks, and proposing an adaptive mixed-precision framework for real-world latency-sensitive tasks.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

#3508 Spotlight · Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica

We propose a method to speedup video diffusion generation through efficient attention.