Researcher, Shanghai Artificial Intelligence Laboratory
2 papers at NeurIPS 2025
We introduce PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, and an improved evaluation framework combining rule- and model-based judgments to advance LLMs' physical reasoning.
Training a new reasoning paradigm of LLMs explicitly contains meta-thinking in a multi-agent and multi-turn setting with RL