1 paper across 1 session
We analyze the flow of tokens across attention layers and use these insights to enhance performance of Transformers.