2 papers across 2 sessions
We analytically show that deep neural collapse is suboptimal in deep cross-entropy unconstrained feature models and explain why it persists empirically despite this.
We show that Schedule-Free methods effectively navigate the river structure of the loss landscape, enabling scalable language model training without decay schedules or extra memory.