PhD student, University of Minnesota - Twin Cities
1 paper at NeurIPS 2025
We show that shallow linear transformers fail to in-context learn linear dynamical systems, uncovering a distinction between in-context learning over iid and non-iid data; in contrast, transformers with log-depth successfully learn dynamical systems.