Associate Professor, University of Michigan - Ann Arbor
1 paper at NeurIPS 2025
We propose BREAD, a novel and effective variant of GRPO that bridges supervised learning and reinforcement learning by employing branch rollouts from expert traces.