1 paper across 1 session
We show that shallow linear transformers fail to in-context learn linear dynamical systems, uncovering a distinction between in-context learning over iid and non-iid data; in contrast, transformers with log-depth successfully learn dynamical systems.