MS student, Brown University
1 paper at NeurIPS 2025
We present VideoHallu, a benchmark of over 3,000 synthetic videos with expert-crafted counterintuitive QA pairs, evaluating MLLMs' ability to detect perceptually obvious abnormalities often missed due to language priors.