PhD student, University of the Chinese Academy of Sciences
1 paper at NeurIPS 2025
A novel algorithm that estimates fine-grained, token-level advantages in reinforcement learning without introducing additional models.