Full Professor, University of Science and Technology of China
1 paper at NeurIPS 2025
Optimize KV cache eviction by adaptively allocating budgets across different attention heads for efficient LLM inference