Assistant Professor, Weizmann Institute of Science
3 papers at NeurIPS 2025
We derive simple generalization bounds for Markov training processes at any time during training, and then apply them to training with Langevin dynamics to improve existing bounds.
We prove that under appropriate conditions, a single-head softmax attention mechanism exhibits benign overfitting
We prove that, under appropriate conditions, linear attention is an almost optimal metalearner for linear classification.