Researcher, Beijing Institute for General Artificial Intelligence
1 paper at NeurIPS 2025
self-play reasoning RL with no data can achieve SOTA against RL models trained with human data