2 papers across 2 sessions
We propose ETT, an end-to-end tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.
This paper introduces a unified visual tokenizer to facilitate unification of visual generation and understanding within a single autoregressive framework.