1 paper across 1 session
We propose a policy gradient algorithm for fine-tuning discrete diffusion models over non-differentiable rewards.