AI Safety Fellow, Anthropic
1 paper at NeurIPS 2025
We introduce the Martingale Score, an unsupervised metric from Bayesian statistics, to show that reasoning in LLMs often leads to belief entrenchment rather than truth-seeking, and shows this score predicts ground-truth accuracy.