Research Professor, University of Science and Technology of China
2 papers at NeurIPS 2025
We propose DuetGraph, a dual-pathway global-local fusion model with coarse-to-fine optimization that mitigates over-smoothing in KG reasoning, achieving SOTA performance, with up to an 8.7% improvement in quality and a 1.8$\times$ acceleration.
Optimize KV cache eviction by adaptively allocating budgets across different attention heads for efficient LLM inference