Researcher, NVIDIA
1 paper at NeurIPS 2025
We provide a systematic exploration and roadmap for latency-optimal small language models through optimized architectural and training strategies.