Assistant Professor, New Jersey Institute of Technology
2 papers at NeurIPS 2025
We propose BREAD, a novel and effective variant of GRPO that bridges supervised learning and reinforcement learning by employing branch rollouts from expert traces.