2 papers across 2 sessions
Aha! enables real-time, autoregressive highlight detection in continuous video using language tasks. Using a VLM and a Dynamic SinkCache for constant memory, it achieves SOTA on benchmarks & shows robotics potential.