1 paper across 1 session
We introduce PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, and an improved evaluation framework combining rule- and model-based judgments to advance LLMs' physical reasoning.