PhD student, Institute of automation, Chinese academy of science, Chinese Academy of Sciences
1 paper at NeurIPS 2025
A new scaling law formula with learning rate annealing that can fit and predict full loss curves.