1 paper across 1 session
We provide a method for enabling length generalization within state-space models by modulating the $A$ matrices per layer.