Researcher, Facebook
1 paper at NeurIPS 2025
We prove that a softmax self-attention layer trained via GD can solved the so-called single-locator regression problem