Zhenyuan Zhang

PhD student, The Hong Kong University of Science and Technology

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 6

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering

#4914 · Hengyu Liu, Chenxin Li, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng

We introduce IR3D-Bench, a benchmark that challenges vision-language models to demonstrate real scene understanding by recreating 3D structures from images using tools, not just describing them.