2 papers across 2 sessions
We propose a contextualized position encoding using dynamic Householder matrices in place of static rotary ones, along with a hardware-efficient training algorithm that improves state tracking performance.
Attention sink in LLMs serves as geometric reference frames that anchor token representations in high-dimensional space, emerging during training as optimal solutions to the coordinate system problem, shaped by architecture and position encodings.