Assistant Professor, Télécom Paris
1 paper at NeurIPS 2025
We describe the training of overparametrized architectures with small weight decay as a two-phase dynamics. In particular during the second phase, it follows a Riemannian flow of the norm on the interpolation manifold.