Daria Soboleva

Researcher, Cerebras Systems, Inc

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

#901 · Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

We derive scaling laws for optimal weight decay and batch size in LLM pre-training, finding optimal (and critical) batch size scales primarily with dataset size; we discuss implications for optimizing time and compute efficiency.