Intern, University of Illinois at Urbana-Champaign
2 papers at NeurIPS 2025
We propose two techniques to improve the data efficiency of LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay.
A Training-Free Bayesianization approach is proposed for LLM adapters that achieves better uncertainty estimation.