2 papers across 2 sessions
we release a cognitively-inspired benchmark for reasoning across scenes that reveals hallucination is an open challenge for multimodal models
Introducing PARTONOMY and PLUM, a new benchmark and segmenting LMM that enable fine-grained, part-level visual reasoning by addressing architectural flaws in existing LMMs and setting a new standard for grounded multimodal understanding.