PhD student, Stanford University
2 papers at NeurIPS 2025
We show that the statistical optimality of estimation methods for causal inference depend in a surprising way on the distribution of the treatment noise.
We introduce IneqMath, an informal inequality proving benchmark, and an LLM-as-judge suite, revealing that top LLMs achieve <10% overall accuracy due to flawed step-wise reasoning.