Postdoc, Massachusetts Institute of Technology
1 paper at NeurIPS 2025
Transformer, Mamba, and RWKV language models show consistent patterns of change in behavior over the course of training