Researcher, Tsinghua University, Tsinghua University
1 paper at NeurIPS 2025
A new scaling law formula with learning rate annealing that can fit and predict full loss curves.