1 paper across 1 session
Enhancing long video understanding via extreme compression by reducing each selected frame to a single token.