3 papers across 2 sessions
To address inefficiency from excessive visual tokens in LVLMs, we propose an information-flow perspective revealing dynamic redundancy emergence and introduce a method aligned with the model’s inherent behavior, outperforming all existing approaches.
A novel visual token pruning method that jointly maximizes both the saliency and coverage of the selected visual tokens to better preserve semantic completeness.
We propose a training-free visual token pruning method CDPruner for MLLM inference acceleration by maximizing the conditional diversity of retained tokens.