4 papers across 3 sessions
Aligning pretrained unimodal models with the proposed framework using limited paired data yields ~52% gains in cross-modality zero-shot classification and ~92% in retrieval.
We propose a novel algorithms, called NGN-M with a strong theoretical convergence analysis and extensive numerical evaluations showing the robustness of our algorithm to the choice of the learning rate hyperparameter.