PhD student, Renmin University of China
1 paper at NeurIPS 2025
Scaling Diffusion Transformers up to 18B Efficiently via $\mu$P