1 paper across 1 session
A new scaling law formula with learning rate annealing that can fit and predict full loss curves.