Vision Encoder

2 papers across 2 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Perception Encoder: The best visual embeddings are not at the output of the network

#4808 · Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Bangalath, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Shang-Wen Li, Piotr Dollar, Christoph Feichtenhofer

We develop a CLIP model that is SotA on both image and video zero-shot recognition. Using its strong, general features we further create SotA encoders for language and spatial tasks.

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

#4716 · Yijun Liang, Ming Li, Chenrui Fan, Ziyue Li, Dang Nguyen, Kwesi Cobbina, Shweta Bhardwaj, Jiuhai Chen, Fuxiao Liu, Tianyi Zhou

This paper introduces a comprehensive benchmark (COLORBENCH) and detailed analysis to systematically analyze how well VLMs perceive, reason about, and robustly handle colors.