3 papers across 2 sessions
A novel 3D audio-visual QA benchmark and training-free spatial reasoning pipeline for Audio-Visual LLMs
A novel alignment framework that integrates generative reward models with multi-modal RLHF.