PhD student, University of Bristol
1 paper at NeurIPS 2025
We propose a scale-invariant attention mechanism for transformers and show it improves performance in out-of-distribution long-context settings.