Intern, ByteDance Inc.
1 paper at NeurIPS 2025
Scaling Diffusion Transformers up to 18B Efficiently via $\mu$P