3 papers across 2 sessions
Training LLM’s with tensor-parallelism without completely synchronizing activations to accelerate training and inference.