Staff Research Scientist, Google
3 papers at NeurIPS 2025
A framework and benchmark to evaluate language models' reasoning on imperfect tabular data
Debate between AI experts outperforms single-advisor consultancy in helping humans make more accurate factual judgments, especially benefiting those with mainstream beliefs.