Architect, Alibaba Group
2 papers at NeurIPS 2025
A framework for tensor and pipeline parallelism to reduce TP bubbles.
Efficient Long Context Fine-tuning through Dynamic Data Scheduling