PhD student, Pennsylvania State University
1 paper at NeurIPS 2025
self-play reasoning RL with no data can achieve SOTA against RL models trained with human data