Postdoc, University of Oxford
1 paper at NeurIPS 2025
Prove with theory and empirical results that prioritising training on questions with "medium" level of difficulty is beneficial for training reasoning models with RL