2 papers across 2 sessions
A novel algorithm that estimates fine-grained, token-level advantages in reinforcement learning without introducing additional models.