Senior Research Scientist, Adobe Research
1 paper at NeurIPS 2025
We recast offline RL as reward-weighted fine-tuning, which allows practical RL optimization of LLM agents using just SFT.