Associate Professor, The Chinese University of Hong Kong
5 papers at NeurIPS 2025
We introduce PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, and an improved evaluation framework combining rule- and model-based judgments to advance LLMs' physical reasoning.
We introduce ComPABench to evaluate VLM compositional reasoning, showing that existing post-training methods struggle, while enhancing vision-text alignment and using progress rewards improves RL-based compositional ability.
We learn offline meta-policies from natural language supervision with contrastive language-decision pre-training, aligning text embeddings to comprehend environment dynamics.