2 papers across 1 session
We introduce TiledFlashLinearAttention a faster kernel algorithm for Linear RNNs and mLSTMs by improved Sequence Parallelism.
This paper presents FlashBias to speed up computation of attention with bias, which brings 1.5x speedup for AlphaFold and 2x speedup for SwinV2.