Full Professor, Carnegie Mellon University
3 papers at NeurIPS 2025
a small audio-language model for audio reasoning that achieves SoTA performance with 50 times fewer parameters and 60 times fewer audio hours.
This paper introduces a new simple but efficient learning mechanism for improving the robust alignment between visual and textual modalities by solving shuffling problems.