PhD student, Northeastern University
1 paper at NeurIPS 2025
We introduce an extreme token-reduction task and a discrete representation (VQToken) that adaptively compresses video token sequences by 99.93% of their original length with only a 0.66% accuracy drop.