2 papers across 2 sessions
We use random matrix theory to estimate the spectral density of matrices too large to fit into memory.
Accelerating attention for long-context reasoning by identifying and loading important tokens and by approximating attention to less important tokens