2 papers across 2 sessions
A framework for tensor and pipeline parallelism to reduce TP bubbles.
DynaPipe dynamically redistributes layers and uses asynchronous coordination to balance computation during LLM inference, significantly reducing latency and outperforming existing pipeline parallelism systems.