Fixed-Point RNNs: Interpolating from Diagonal to Dense

Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto

ELLIS Institute Tübingen· Max Planck Institute for Intelligent Systems· Imperial College London

Abstract

Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures.

Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e., diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs.

The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the state-tracking benchmarks

A_{5}

and

S_{5}

, while matching performance on copying and other tasks.