1 paper across 1 session
We propose a novel Cross-Scene Spatial Reasoning and Grounding task, along with a new baseline and a benchmark dataset.