Director, Alibaba Group
3 papers at NeurIPS 2025
Scaling Diffusion Transformers up to 18B Efficiently via $\mu$P
We propose a universal video grounding model based on MLLMs, which achieves superior accuracy, generalizability, and robustness.