In-Context Learning Strategies Emerge Rationally

Daniel Wurgaft, Ekdeep Singh Lubana, Core Francisco Park, Hidenori Tanaka, Gautam Reddy, Noah Goodman

Stanford· Harvard· NTT Research, Inc.· Princeton

In-Context Learning Loss-Complexity tradeoff Bayesian Modeling Algorithmic Complexity

Abstract

Recent work analyzing in-context learning (ICL) has identified a broad set ofstrategies that describe model behavior in different experimental conditions. Weaim to unify these findings by asking why a model learns these disparate strategies inthe first place.

Specifically, we start with the observation that when trained to learn amixture of tasks, as is popular in the literature, the strategies learned by a model forperforming ICL can be captured by a family of Bayesian predictors: a memorizingpredictor, which assumes a discrete prior on the set of seen tasks, and a generalizingpredictor, where the prior matches the underlying task distribution.

Adopting thenormative lens of rational analysis, where a learner’s behavior is explained asan optimal adaptation to data given computational constraints, we develop ahierarchical Bayesian framework that almost perfectly predicts Transformer next-token predictions throughout training—without assuming access to its weights.Under this framework, pretraining is viewed as a process of updating the posteriorprobability of different strategies, and inference-time behavior as a posterior-weighted average over these strategies’ predictions.

Our framework draws oncommon assumptions about neural network learning dynamics, which make explicita tradeoff between loss and complexity among candidate strategies: beyond howwell it explains the data, a model’s preference towards implementing a strategyis dictated by its complexity. This helps explain well-known ICL phenomena,while offering novel predictions: e.g., we show a superlinear trend in the timescalefor transitioning from generalization to memorization as task diversity increases.Overall, our work advances an explanatory and predictive account of ICL groundedin tradeoffs between strategy loss and complexity.