PhD student, The Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
We introduce IR3D-Bench, a benchmark that challenges vision-language models to demonstrate real scene understanding by recreating 3D structures from images using tools, not just describing them.