Researcher, Adobe Research
2 papers at NeurIPS 2025
We recast offline RL as reward-weighted fine-tuning, which allows practical RL optimization of LLM agents using just SFT.
We introduce transductive program synthesis: synthesizing programs using test inputs.