1 paper across 1 session
We use grokking to disentangle generalization from training dynamics and show that relative flatness, not neural collapse, is a necessary and more predictive indicator of generalization in deep networks.