MS student, Shanghai University of Finance and Economics
1 paper at NeurIPS 2025
We propose a novel KV cache compression method that exploits the asymmetry between locally homogeneous keys and heterogeneous values in attention mechanisms, enabling more efficient long-context processing.