5 papers across 3 sessions
We solve the learning dynamics of (a close approximation of) word2vec in closed form, revealing what semantic features are learned.
In sign-diverse Hebbian/anti-Hebbian or E-I networks, inherent non-gradient “curl” terms arise, and can, depending on network architecture, destabilize gradient-descent solutions or paradoxically accelerate learning beyond pure gradient flow.
We mathematically model the evolution of the NTK in a fully deep neural network trained to represent natural images.
Closed‑form analysis of diffusion learning dynamics uncovers an inverse‑variance law of distributional convergence; MLP‑UNets comply, while convolutional U‑Nets break it.
While loss decreases monotonically during LLM training, the representations undergo distinct geometric phases across pretraining and post-training, which in turn determine when and how the model acquires memorization or generalization capabilities.