2 papers across 2 sessions
We find a small set of neurons whose activations can be redirected at test-time to mitigate high-norm artifacts in Vision Transformers.
Attention sink in LLMs serves as geometric reference frames that anchor token representations in high-dimensional space, emerging during training as optimal solutions to the coordinate system problem, shaped by architecture and position encodings.