2 papers across 2 sessions
A novel 3D audio-visual QA benchmark and training-free spatial reasoning pipeline for Audio-Visual LLMs
We introduce SpatialReasoner, a novel large vision-language model that improves 3D spatial reasoning with explicit 3D representations and generalizable 3D thinking.