3 papers across 2 sessions
We prove that under appropriate conditions, a single-head softmax attention mechanism exhibits benign overfitting