1 paper across 1 session
We describe the training of overparametrized architectures with small weight decay as a two-phase dynamics. In particular during the second phase, it follows a Riemannian flow of the norm on the interpolation manifold.