Research Scientist, RIKEN AIP
3 papers at NeurIPS 2025
Scaling Diffusion Transformers up to 18B Efficiently via $\mu$P