Undergrad student, Xi'an University of Electronic Science and Technology
1 paper at NeurIPS 2025
We compare RL methods DPO & GRPO for image generation, showing their strengths & how rewards affect generalization. Explores scaling for better CoT-based synthesis.