4 papers across 2 sessions
We derive well-known learning rules from an objective that casts learning rules as policies for navigating uncertain loss landscapes
We propose a value gradient matching formulation for reward finetuning/alignment for flow matching models with the theory of optimal control, and empirically verify our method on the popular text-to-image flow matching model StableDiffusion3