1 paper across 1 session
We propose an efficient strategy for adversarial finetuning of the CLIP text encoder, enabling robustness in zero-shot classification, text-to-image retrieval and text-to-image generation.