Researcher, Microsoft
4 papers at NeurIPS 2025
We propose two techniques to improve the data efficiency of LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay.
We introduce a new method to train autoregressive video diffusion models by performing autoregressive self-rollout with KV caching during training.