Assistant Professor, Tsinghua University
3 papers at NeurIPS 2025
We show that knowledge acuiqistion under data mixing can exhibit phase transitions with respect to the mixing ratio and model size.
This paper introduces a nove metric (REG) for evaluating the reasoning efficiency of LRMs and a reinforcement learning method (REO-RL) that significantly reduces reasoning redundancy while maintaining accuracy.