Postdoc, Department of Computer Science, University of Toronto
1 paper at NeurIPS 2025
We introduce STITCH-OPE, a guided-diffusion framework for off-policy evaluation that stitches short behavior-conditioned sub-trajectories, uses negative-behavior guidance to correct distribution shift, and outperforms baselines across all metrics.