2 papers across 2 sessions
We provide a method for enabling length generalization within state-space models by modulating the $A$ matrices per layer.