1 paper across 1 session
We compare RL methods DPO & GRPO for image generation, showing their strengths & how rewards affect generalization. Explores scaling for better CoT-based synthesis.