Joel Hestness

Research Scientist, Cerebras Systems, Inc

2 papers at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Don't be lazy: CompleteP enables compute-efficient deep transformers

#4000 · Nolan Simran Dey, Bin Claire Zhang, Lorenzo Noci, Mufan Li, Blake Bordelon, Shane Bergsma, Cengiz Pehlevan, Boris Hanin, Joel Hestness

We introduce CompleteP, which offers depth-wise HP transfer, FLOP savings when training deep models, and a larger range of compute-efficient width/depth ratios.

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

#901 · Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal, Gavia Gray, Daria Soboleva, Joel Hestness

We derive scaling laws for optimal weight decay and batch size in LLM pre-training, finding optimal (and critical) batch size scales primarily with dataset size; we discuss implications for optimizing time and compute efficiency.