2 papers across 2 sessions
We use grokking to disentangle generalization from training dynamics and show that relative flatness, not neural collapse, is a necessary and more predictive indicator of generalization in deep networks.
This paper introduces a collection of time-series anomaly detection datasets.