Yuhao Dong

PhD student, Nanyang Technological University

3 papers at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

#4706 · Hongbo Liu, Jingwen He, Yi Jinn, Dian Zheng, Yuhao Dong, Fan Zhang, Ziqi Huang, Yinan He, Weichao Chen, Yu Qiao, Wanli Ouyang, Shengjie Zhao, Ziwei Liu

We present ShotBench, a new benchmark for evaluating VLMs' cinematography understanding, along with the ShotQA dataset and our ShotVL model, which achieves state-of-the-art performance over both strong open-source and proprietary baselines.

Poster Session 2

2 papers

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

#2311 Spotlight · Lingdong Kong, Dongyue Lu, Alan Liang, Rong Li, Yuhao Dong, Tianshuai Hu, Lai Xing Ng, Wei Tsang Ooi, Benoit R Cottereau

Talk2Event is a new benchmark for attribute-aware visual grounding from event cameras.

3EED: Ground Everything Everywhere in 3D

#4603 · Rong Li, Yuhao Dong, Tianshuai Hu, Alan Liang, Youquan Liu, Dongyue Lu, Liang Pan, Lingdong Kong, Junwei Liang, Ziwei Liu

We present 3EED, the first large-scale benchmark for 3D visual grounding across vehicles, drones, and quadrupeds, with over 134K 3D objects and 25K human-verified expressions in diverse outdoor scenes.