1 paper across 1 session
A novel visual token pruning method that jointly maximizes both the saliency and coverage of the selected visual tokens to better preserve semantic completeness.