1 paper across 1 session
Prove with theory and empirical results that prioritising training on questions with "medium" level of difficulty is beneficial for training reasoning models with RL