Postdoc, University of Surrey
2 papers at NeurIPS 2025
ViMaR is a two-stage, value-guided inference framework that uses margin-based rewards to produce faster, more accurate, and less hallucinatory captions, enabling scalable and self-improving vision–language models.
We introduce CG-SSL, a concept-guided self-supervised learning framework that aligns meaningful image regions across views, achieving state-of-the-art performance on dense prediction tasks.