Assistant Professor, Tsinghua University, Tsinghua University
5 papers at NeurIPS 2025
DIET makes LLMs more token-efficient by using problem difficulty to dynamically guide compression during reinforcement learning, boosting reasoning performance and enabling superior inference scaling under fixed budgets.
We introduce PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, and an improved evaluation framework combining rule- and model-based judgments to advance LLMs' physical reasoning.
We propose Learning to Focus (LeaF), which identifies and masks confounding tokens via gradient‐based comparisons, thereby improving long‐context reasoning accuracy and interpretability in large language models.