1 paper across 1 session
We introduce the Martingale Score, an unsupervised metric from Bayesian statistics, to show that reasoning in LLMs often leads to belief entrenchment rather than truth-seeking, and shows this score predicts ground-truth accuracy.