Associate Professor, Georgia Institute of Technology
2 papers at NeurIPS 2025
We provide a systematic exploration and roadmap for latency-optimal small language models through optimized architectural and training strategies.
Nemotron-CLIMB automates data mixture optimization for pre-training, improving domain adaptation and outperforming Llama-3.2-1B by 2.0% on general reasoning.