PhD student, University of Texas at Austin
2 papers at NeurIPS 2025
We propose two techniques to improve the data efficiency of LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay.
We propose a new method to test equality between the true and estimated posterior distributions, establishing necessary and sufficient conditions for distributional equivalence , with both theoretical guarantees and practical scalability.