PhD student, Nanyang Technological University
3 papers at NeurIPS 2025
We present ShotBench, a new benchmark for evaluating VLMs' cinematography understanding, along with the ShotQA dataset and our ShotVL model, which achieves state-of-the-art performance over both strong open-source and proprietary baselines.
Talk2Event is a new benchmark for attribute-aware visual grounding from event cameras.
We present 3EED, the first large-scale benchmark for 3D visual grounding across vehicles, drones, and quadrupeds, with over 134K 3D objects and 25K human-verified expressions in diverse outdoor scenes.