Postdoc, Technische Universität Dortmund
1 paper at NeurIPS 2025
We use grokking to disentangle generalization from training dynamics and show that relative flatness, not neural collapse, is a necessary and more predictive indicator of generalization in deep networks.