1 paper across 1 session
Accelerating attention for long-context reasoning by identifying and loading important tokens and by approximating attention to less important tokens