PhD student, The Chinese University of Hong Kong
1 paper at NeurIPS 2025
We compare RL methods DPO & GRPO for image generation, showing their strengths & how rewards affect generalization. Explores scaling for better CoT-based synthesis.