Assistant Professor, Stanford University
2 papers at NeurIPS 2025
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence
We show that limiting a model's confidence during training can improve test-time scaling in mathematical reasoning.