1 paper across 1 session
We analyze Mamba's shortcomings in long-range sequence processing through its expressiveness, inductive bias, and training stability, and design a remedy called Block-Biased S6 to achieve state-of-the-art performance on long-range tasks.