Assistant Professor, Imperial College London
1 paper at NeurIPS 2025
We propose a policy gradient algorithm for fine-tuning discrete diffusion models over non-differentiable rewards.