Undergrad student, Peking University
1 paper at NeurIPS 2025
We compare RL methods DPO & GRPO for image generation, showing their strengths & how rewards affect generalization. Explores scaling for better CoT-based synthesis.