Undergrad student, Korea Advanced Institute of Science & Technology
1 paper at NeurIPS 2025
We show that Schedule-Free methods effectively navigate the river structure of the loss landscape, enabling scalable language model training without decay schedules or extra memory.