Researcher, Saarland Informatics Campus, Max-Planck Institute
1 paper at NeurIPS 2025
We interpret attention as discrete-time markov chains and show its effectiveness on various downstream tasks.