PhD student, University of Illinois at Urbana-Champaign
1 paper at NeurIPS 2025
We propose two techniques to improve the data efficiency of LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay.