2 papers across 2 sessions
We develop a theoretical model of information flow constraints in LLMs that predicts LLM failures on global reasoning tasks.
Local SGD converges faster under low second-order heterogeneity, and we prove it with tight bounds and supporting experiments.