3 papers across 2 sessions
Hypernetwork-based framework for dynamic adaption of precomputed database features.
This paper introduces DAVE, a diagnostic benchmark that requires both audio and visual inputs and separates evaluation into subcategories to reveal specific failure modes in audio-visual models.