KV Cache Compression; Sparse attention; Low-rank projection

1 paper across 1 session

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

SALS: Sparse Attention in Latent Space for KV Cache Compression

#3601 · Junlin Mu, Hantao Huang, Jihang Zhang, Minghui Yu, Tao Wang, Yidong Li

We propose SALS, a sparse attention framework in latent space that enables low-rank KV cache compression with minimal reconstruction overhead, achieving up to 6.4× compression and 5.7× attention speed-up without sacrificing accuracy.