PhD student, Sun Yat-Sen University
1 paper at NeurIPS 2025
DynaPipe dynamically redistributes layers and uses asynchronous coordination to balance computation during LLM inference, significantly reducing latency and outperforming existing pipeline parallelism systems.