2 papers across 2 sessions
We propose ETT, an end-to-end tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.