Algorithm Engineer, Alibaba Group
3 papers at NeurIPS 2025
CoT-Diff tightly couples MLLM reasoning with diffusion to enable step-by-step 3D layout planning and dynamic spatial control within a single generation process.
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 4.96× speedup without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.