2 papers across 2 sessions
We extend a new RL-based gradient-informed finetuning method to the task of reward finetuning/alignment for 3D native diffusion models.
We propose a value gradient matching formulation for reward finetuning/alignment for flow matching models with the theory of optimal control, and empirically verify our method on the popular text-to-image flow matching model StableDiffusion3