Assistant Professor, Institute of Automation, Chinese Academy of Sciences
1 paper at NeurIPS 2025
A novel algorithm that estimates fine-grained, token-level advantages in reinforcement learning without introducing additional models.