1 paper across 1 session
We propose a method which exploit KV cache sparsity efficiently and dynamically through Top-P sampling.