Full Professor, University of California Berkeley
3 papers at NeurIPS 2025
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence
We solve the learning dynamics of (a close approximation of) word2vec in closed form, revealing what semantic features are learned.