1 paper across 1 session
A novel benchmark using a comprehensive preference dataset to evaluate multimodal judges across multiple key perspectives