1 paper across 1 session
Transformer, Mamba, and RWKV language models show consistent patterns of change in behavior over the course of training