Associate Professor, Xiamen University
3 papers at NeurIPS 2025
KV cache retrieval for large language models using nonlinear hashing function.
We propose a superior MoE pruning framework that determines the importance of experts in MoE models through a theoretical perspective.