Researcher, NVIDIA
2 papers at NeurIPS 2025
The paper introduces a pruning and distillation method for hybrid LLMs, compressing Nemotron-H 8B to 4B with better accuracy and ~2× faster inference, advancing the efficiency-accuracy trade-off.
Nemotron-CLIMB automates data mixture optimization for pre-training, improving domain adaptation and outperforming Llama-3.2-1B by 2.0% on general reasoning.