MS student, Beijing University of Aeronautics and Astronautics
1 paper at NeurIPS 2025
We construct a Progress Reward Model with convergence guarantee for Reinforcement Learning via Large Language Models.