3 papers across 2 sessions
Transformer-based language models learn low-dimensional task manifolds across layers, with similar patterns/trends in intrinsic dimensions revealing similar compression strategies despite varying architectures/sizes.
Lower local intrinsic embedding dimension signals better performance, detecting when LLMs improve, overfit, or grok.