2 papers across 2 sessions
The token embeddings for LLMs are unlikely to be manifolds with low curvature, based upon a novel rigorous statistical hypothesis test.
We characterize the structure of embeddings obtained via gradient descent, showing that the attention mechanism provably selects important tokens.