Associate Professor, Tel Aviv University
1 paper at NeurIPS 2025
We interpret attention as discrete-time markov chains and show its effectiveness on various downstream tasks.