Associate Professor, University of Surrey
3 papers at NeurIPS 2025
ViMaR is a two-stage, value-guided inference framework that uses margin-based rewards to produce faster, more accurate, and less hallucinatory captions, enabling scalable and self-improving vision–language models.
A novel framework for responsible text-to-image generation that incorporates a dual-module transformation on the intermediate bottleneck representations of diffusion models.
We introduce CG-SSL, a concept-guided self-supervised learning framework that aligns meaningful image regions across views, achieving state-of-the-art performance on dense prediction tasks.