Mostofa Patwary

Principal Researcher, NVIDIA

3 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

#116 Spotlight · Jaehun Jung, Seungju Han, Ximing Lu, Skyler Hallinan, David Acuna, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi

We present G-Vendi, a data diversity measure that strongly correlates with LLM reasoning generalization in OOD benchmarks; we use this insight to diverse synthetic reasoning data, which leads to SOTA distilled models in NLI and math reasoning.

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

#3518 · Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan, Marcin Chochowski, Yashaswi Karnati, Raviraj Bhuminand Joshi, Ameya Sunil Mahabaleshwarkar, ZIJIA CHEN, Yoshi Suhara, Oluwatobi Olabiyi, Daniel Korzekwa, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, Bryan Catanzaro, Ashwath Aithal, Nima Tajbakhsh, Pavlo Molchanov

The paper introduces a pruning and distillation method for hybrid LLMs, compressing Nemotron-H 8B to 4B with better accuracy and ~2× faster inference, advancing the efficiency-accuracy trade-off.

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

#111 Spotlight · Shizhe Diao, Yu Yang, Yonggan Fu, Xin Dong, Dan SU, Markus Kliegl, ZIJIA CHEN, Peter Belcak, Yoshi Suhara, Hongxu Yin, Mostofa Patwary, Yingyan Celine Lin, Jan Kautz, Pavlo Molchanov

Nemotron-CLIMB automates data mixture optimization for pre-training, improving domain adaptation and outperforming Llama-3.2-1B by 2.0% on general reasoning.