2 papers across 2 sessions
A novel 3D audio-visual QA benchmark and training-free spatial reasoning pipeline for Audio-Visual LLMs