Research Scientist, Adobe Systems
2 papers at NeurIPS 2025
We recast offline RL as reward-weighted fine-tuning, which allows practical RL optimization of LLM agents using just SFT.