6 papers across 3 sessions
We present a vision-centric token compression in LLM, inspired by human selective reading strategy.
DIET makes LLMs more token-efficient by using problem difficulty to dynamically guide compression during reinforcement learning, boosting reasoning performance and enabling superior inference scaling under fixed budgets.
We propose a progressive consistency distillation framework that enhances the efficiency of MLLMs by significantly reducing computational cost while preserving strong performance.
HoliTom introduces a training-free holistic outer-inner token merging framework for video LLMs, significantly accelerating inference with negligible performance degradation.