3 papers across 2 sessions
We propose a contextualized position encoding using dynamic Householder matrices in place of static rotary ones, along with a hardware-efficient training algorithm that improves state tracking performance.
We propose a novel hybrid of position embedding to improve the length generalization ability of Vision-Language Models.