Vision Foundation Models

4 papers across 2 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models

#2606 · Jesimon Barreto, Carlos Caetano, Andre Araujo, William Schwartz

We show that adapting vision foundation models using self-supervised fine-tuning with simple object-centric videos substantially improves representation quality without labels.

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

#4618 · Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin, David Schinagl, Samuel Schulter, Horst Possegger

STSBench is a benchmark that evaluates the capabilities of Multi-modal Large Language Models to reason about spatio-temporal actions.

Poster Session 6

2 papers

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Revisiting Semi-Supervised Learning in the Era of Foundation Models

#2712 · Ping Zhang, Zheda Mai, Quang-Huy (Percy) Nguyen, Wei-Lun (Harry) Chao

CG-SSL: Concept-Guided Self-Supervised Learning

#2609 · Sara Atito, Josef Kittler, Imran Razzak, Muhammad Awais

We introduce CG-SSL, a concept-guided self-supervised learning framework that aligns meaningful image regions across views, achieving state-of-the-art performance on dense prediction tasks.