1 paper across 1 session
We propose DeepASA, an object-oriented one-for-all network that mimics human auditory scene analysis to perform source separation, sound event detection, and direction-of-arrival estimation, achieving state-of-the-art performance on downstream tasks.