2 papers across 2 sessions
Transformer, Mamba, and RWKV language models show consistent patterns of change in behavior over the course of training
We analyze formally and empirically shortcuts arising in concept-based models