1 paper across 1 session
We introduce IR3D-Bench, a benchmark that challenges vision-language models to demonstrate real scene understanding by recreating 3D structures from images using tools, not just describing them.