1 paper across 1 session
We introduce a Functional Scaling Law that predicts full SGD loss dynamics under arbitrary learning rate schedules.