2 papers across 2 sessions
This paper introduces a novel Multimodal Attention-based Normalizing Flow approach to developing explicit, interpretable, and tractable multimodal fusion learning
This paper introduces a new simple but efficient learning mechanism for improving the robust alignment between visual and textual modalities by solving shuffling problems.