today local_bar

Maksim Khadkevich

Researcher, NVIDIA

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

#5308 · Yonggan Fu, Xin Dong, Shizhe Diao, Matthijs Van keirsbilck, Hanrong Ye, Wonmin Byeon, Yashaswi Karnati, Lucas Liebenwein, Maksim Khadkevich, Alexander Keller, Jan Kautz, Yingyan Celine Lin, Pavlo Molchanov

We provide a systematic exploration and roadmap for latency-optimal small language models through optimized architectural and training strategies.