MS student, Tianjin University
1 paper at NeurIPS 2025
We introduce IR3D-Bench, a benchmark that challenges vision-language models to demonstrate real scene understanding by recreating 3D structures from images using tools, not just describing them.