2 papers across 2 sessions
We introduce TiledFlashLinearAttention a faster kernel algorithm for Linear RNNs and mLSTMs by improved Sequence Parallelism.
We frame dynamic regret minimization as a static regret problem in an RKHS