2 papers across 2 sessions
We present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring each point contributes an equal amount of information to the final generation.
We propose GenPO, which effectively incorporates invertible diffusion model into on-policy RL, and deals with the challenge of log-likehood computation in diffusion policies.