PhD student, University of Michigan - Ann Arbor
1 paper at NeurIPS 2025
Early phase training of Transformers on algorithmic tasks shows a plateau in loss, repetition bias and representation collapse before sudden drop in loss.