1 paper across 1 session
We present the first pure Mamba-based architecture for video action detection, achieving Transformer-level performance with significantly reduced computation, inference time and memory costs.