4 papers across 2 sessions
We show that learning sparse attention is prone to emerging behaviors during training, and study (theoretically and empirically) how data and model design influence emergence speed.
Emergence of an alignment between LLMs' and the brain's computational dynamics, and key factors allowing it : scaling and context size.
We discover that coherent value systems emerge with scale in LLMs and propose the research avenue of utility engineering to analyze and control these emergent value systems.
We introduce Alternating Gradient Flows, a framework modeling feature learning in two-layer networks with small initialization as utility maximization and cost minimization—unifying saddle-to-saddle analyses and explaining Fourier feature emergence