Ben Anson

PhD student, University of Bristol

1 paper at NeurIPS 2025

Poster Session 2

We propose a scale-invariant attention mechanism for transformers and show it improves performance in out-of-distribution long-context settings.