Full Professor, Princeton University
3 papers at NeurIPS 2025
We introduce Ineq-Comp, a benchmark for testing compositional reasoning in formal inequality proving. Simple human-intuitive transformations cause major accuracy drops, showing that current LLM provers lack robust compositional generalization.