1 paper across 1 session
We show that DiLoCo, a method for communication-efficient language model training, exhibits reliable scaling law behavior.