2 papers across 2 sessions
We compare RL methods DPO & GRPO for image generation, showing their strengths & how rewards affect generalization. Explores scaling for better CoT-based synthesis.