1 paper across 1 session
We prove that a softmax self-attention layer trained via GD can solved the so-called single-locator regression problem