Researcher, Shanghai Artificial Intelligence Laboratory
1 paper at NeurIPS 2025
We propose a method which exploit KV cache sparsity efficiently and dynamically through Top-P sampling.