weight decay

2 papers across 2 sessions

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

#901 · Shane Bergsma, Nolan Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

We derive scaling laws for optimal weight decay and batch size in LLM pre-training, finding optimal (and critical) batch size scales primarily with dataset size; we discuss implications for optimizing time and compute efficiency.

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

#3405 · Di He, Songjun Tu, Ajay Jaiswal, Li Shen, Ganzhao Yuan, Shiwei Liu, Lu Yin

We propose AlphaDecay, a per-module weight decay method guided by heavy-tailedness, improving large language model performance.