1 paper across 1 session
We propose SALS, a sparse attention framework in latent space that enables low-rank KV cache compression with minimal reconstruction overhead, achieving up to 6.4× compression and 5.7× attention speed-up without sacrificing accuracy.