Researcher, Microsoft
2 papers at NeurIPS 2025
We develop trust region methods for stochastic optimal control to improve sampling from unnormalized densities, transition path sampling, and diffusion model finetuning.
We propose a value gradient matching formulation for reward finetuning/alignment for flow matching models with the theory of optimal control, and empirically verify our method on the popular text-to-image flow matching model StableDiffusion3