1 paper across 1 session
Aha! enables real-time, autoregressive highlight detection in continuous video using language tasks. Using a VLM and a Dynamic SinkCache for constant memory, it achieves SOTA on benchmarks & shows robotics potential.