2 papers across 1 session
We propose a novel Cross-Scene Spatial Reasoning and Grounding task, along with a new baseline and a benchmark dataset.
We propose Refer-Judge, a Jury-and-Judge Chain-of-Thought framework that leverages MLLMs to uncover toxic data in 3D visual grounding