3 papers across 2 sessions
We propose GLSim, a training-free framework that combines global and local embedding similarity signals for accurate object hallucination detection in LVLMs, outperforming prior methods.
Contrastive decoding fails to mitigate object hallucinations in MLLMs—apparent improvements stem from misleading factors rather than genuine effectiveness.