Intern, Mohamed bin Zayed University of Artificial Intelligence
1 paper at NeurIPS 2025
We introduce PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, and an improved evaluation framework combining rule- and model-based judgments to advance LLMs' physical reasoning.