3 papers across 3 sessions
We use random matrix theory to estimate the spectral density of matrices too large to fit into memory.
Accelerating attention for long-context reasoning by identifying and loading important tokens and by approximating attention to less important tokens