PhD student, Department of Computer Science, ETHZ - ETH Zurich
1 paper at NeurIPS 2025
We introduce RealMath, a novel benchmark derived directly from research papers and mathematical forums that assesses LLMs' abilities on authentic mathematical tasks.