MLLM agents

2 papers across 2 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

#4908 Spotlight · Jiani Huang, Amish Sethi, Matthew Kuo, Mayank Keoliya, Neelay Velingker, JungHo Jung, Ser Nam Lim, Ziyang Li, Mayur Naik

Contextualizing MLLM-based agents with grounded scene graphs boosts their performance.

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

#2513 · Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen

We present Open CaptchaWorld, a benchmark that tests multimodal LLM agents on solving real-world CAPTCHAs via multi-step reasoning and interaction, revealing large gaps between current models and human performance.