Researcher, NVIDIA
3 papers at NeurIPS 2025
We provide a systematic exploration and roadmap for latency-optimal small language models through optimized architectural and training strategies.