2 papers across 1 session
We show that deep neural networks across architectures and training conditions all instantiate the same abstract algorithm for modular addition.
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence