4 papers across 3 sessions
We find that rerouting spurious shortcuts in adapter training enables robust disentanglement for text-to-image generation with adapters.
We propose Flow-GRPO, the first method to integrate online RL into flow matching models, significantly enhancing text-to-image generation performance.
We present a learning framework that aligns text-to-image diffusion models with human preferences through inverse reinforcement learning and a balance of offline and online training.