critical batch size

2 papers across 2 sessions

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

#5412 Spotlight · Will Merrill, Shane Arora, Dirk Groeneveld, Hanna Hajishirzi

We propose a simple way to measure the critical batch size for language model pretraining that alleviates issues with existing methods, and show that this can be used to train language models with fewer gradient steps in practice.

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

#901 · Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

We derive scaling laws for optimal weight decay and batch size in LLM pre-training, finding optimal (and critical) batch size scales primarily with dataset size; we discuss implications for optimizing time and compute efficiency.