1 paper across 1 session
A novel algorithm that estimates fine-grained, token-level advantages in reinforcement learning without introducing additional models.