Researcher, Princeton University
3 papers at NeurIPS 2025
we release a cognitively-inspired benchmark for reasoning across scenes that reveals hallucination is an open challenge for multimodal models
A comprehensive empirical study on how coreset selection methods impact bias and group robustness of downstream models.
Our new benchmark AbstentionBench reveals reasoning models struggle to determine when not to answer.