softmax self-attention; theory of transformers; statistical physics; optimization dynamics - NeurIPS 2025

today local_bar

softmax self-attention; theory of transformers; statistical physics; optimization dynamics

1 paper across 1 session

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Understanding Softmax Attention Layers:\\ Exact Mean-Field Analysis on a Toy Problem

#5311 · Elvis Dohmatob

We prove that a softmax self-attention layer trained via GD can solved the so-called single-locator regression problem