3 papers across 3 sessions
We study the mechanism of chain of continuous thought on the graph reachability problem, and show it can reason by maintaining a superposition of multiple search traces both theoretically and empirically.
Neural scaling law in LLMs is explained through representation interference due to superposition