2 papers across 2 sessions
We show that DiLoCo, a method for communication-efficient language model training, exhibits reliable scaling law behavior.
We propose a stochastic federated learning framework with inherent communication regularization and principled compression via remote source generation. It achieves 5–32× communication savings underlined by theoretical guarantees.