4 papers across 2 sessions
We introduce a large-scale synthetic dataset and a fine-grained alignment framework for composed person retrieval, and provide a manually annotated benchmark test set for objective evaluation.
The paper introduces Generalized Contrastive Learning (GCL), a novel loss function that enhances multimodal retrieval performance by leveraging existing image-caption datasets.