1 paper across 1 session
Transformer-based language models learn low-dimensional task manifolds across layers, with similar patterns/trends in intrinsic dimensions revealing similar compression strategies despite varying architectures/sizes.