1 paper across 1 session
We propose GenPO, which effectively incorporates invertible diffusion model into on-policy RL, and deals with the challenge of log-likehood computation in diffusion policies.