Assistant Professor, University of Michigan - Ann Arbor
2 papers at NeurIPS 2025
Early phase training of Transformers on algorithmic tasks shows a plateau in loss, repetition bias and representation collapse before sudden drop in loss.
We prove that under appropriate conditions, a single-head softmax attention mechanism exhibits benign overfitting