Assistant Professor, University of Maryland, College Park
4 papers at NeurIPS 2025
This paper introduces a comprehensive benchmark (COLORBENCH) and detailed analysis to systematically analyze how well VLMs perceive, reason about, and robustly handle colors.
We present VideoHallu, a benchmark of over 3,000 synthetic videos with expert-crafted counterintuitive QA pairs, evaluating MLLMs' ability to detect perceptually obvious abnormalities often missed due to language priors.