2 papers across 2 sessions
By framing grokking as computational glass relaxation, this work explains grokking from the perspective of Boltzmann entropy and proposes a physics-based grokking-resistant optimizer.
Lower local intrinsic embedding dimension signals better performance, detecting when LLMs improve, overfit, or grok.