1 paper across 1 session
To train an adaptive video tokenizer, we introduce probabilistic taildrop to inject visual complexity prior to the tokenizer and incorporate GRPO for post-training, which further boosts efficiency in a task-aware adaptive manner.