1 paper across 1 session
Attention sink in LLMs serves as geometric reference frames that anchor token representations in high-dimensional space, emerging during training as optimal solutions to the coordinate system problem, shaped by architecture and position encodings.