Multimodal Generation and Understanding

2 papers across 2 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

#5003 · Wenxuan Wang, Fan Zhang, Yufeng Cui, Haiwen Diao, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang

We propose ETT, an end-to-end tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

UniTok: a Unified Tokenizer for Visual Generation and Understanding

#4815 Spotlight · Chuofan Ma, Yi Jiang, Junfeng Wu, Jihan Yang, Xin Yu, Zehuan Yuan, BINGYUE PENG, Xiaojuan Qi

This paper introduces a unified visual tokenizer to facilitate unification of visual generation and understanding within a single autoregressive framework.