1 paper across 1 session
A SoTA sequence parallelism for linear attention with a brand new collective communication.