1 paper across 1 session
We propose SuperCLIP, a simple and efficient extension to CLIP that adds classfication-based supervision to improve fine-grained image-text alignment without requiring extra annotations or significant computation.