1 paper across 1 session
We prove that under appropriate conditions, a single-head softmax attention mechanism exhibits benign overfitting