Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#3504
Linear Attention for Efficient Bidirectional Sequence Modeling
Abstract
Linear Transformers and State Space Models have emerged as efficient alternativesto softmax Transformers for causal sequence modeling, enabling parallel trainingvia matrix multiplication and efficient RNN-style inference. However, despite theirsuccess in causal tasks, no unified framework exists for applying Linear Transformers to bidirectional sequence modeling.
We introduce LION, the first framework tosystematically extend Linear Transformers to the bidirectional setting. LION generalizes three core representations commonly used in the causal case—full LinearAttention, bidirectional RNN, and chunkwise parallel form—to the bidirectionalsetting. These forms are theoretically equivalent and enable models to exploit thestrengths of each during training and inference.
Across standard bidirectional tasks, LION enables models to matchor exceed the performance of softmax Transformers, while offering significantlyfaster training and more efficient inference than existing State Space Models.