vision-language

2 papers across 1 session

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models

#4810 · Maya Varma, Jean-Benoit Delbrouck, Sophie Ostmeier, Akshay Chaudhari, Curtis Langlotz

We introduce TRoVe, an automated approach for discovering error-inducing static feature biases learned by temporal VLMs.

Perception Encoder: The best visual embeddings are not at the output of the network

#4808 · Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Bangalath, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollar, Christoph Feichtenhofer

We develop a CLIP model that is SotA on both image and video zero-shot recognition. Using its strong, general features we further create SotA encoders for language and spatial tasks.