PhD student, Institute of automation, Chinese academy of science, Chinese Academy of Sciences
1 paper at NeurIPS 2025
A novel algorithm that estimates fine-grained, token-level advantages in reinforcement learning without introducing additional models.