Principal Researcher, Allen Institute for Artificial Intelligence
2 papers at NeurIPS 2025
We propose a simple way to measure the critical batch size for language model pretraining that alleviates issues with existing methods, and show that this can be used to train language models with fewer gradient steps in practice.