PhD student, University of California, Berkeley
3 papers at NeurIPS 2025
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence
We solve the learning dynamics of (a close approximation of) word2vec in closed form, revealing what semantic features are learned.
We introduce a theoretical model for the co-occurrence statistics of words that quantitatively explains the emergence of linear analogies of word embedding in models like Word2Vec and GloVe.