Smooth Regularization for Efficient Video Recognition

Gil Goldman, Raja Giryes, Mahadev Satyanarayanan

Carnegie Mellon University· Tel Aviv University

Video recognition Temporal smoothness regularization Temporal coherence Lightweight video models

Abstract

We propose a smooth regularization technique that instills a strong temporal inductive bias in video recognition models, particularly benefiting lightweight architectures. Our method encourages smoothness in the intermediate-layer embeddingsof consecutive frames by modeling their changes as a Gaussian Random Walk (GRW). This penalizes abrupt representational shifts, thereby promoting low-acceleration solutions that better align with the natural temporal coherence inherentin videos.

By leveraging this enforced smoothness, lightweight models can moreeffectively capture complex temporal dynamics. Applied to such models, our technique yields a 3.8%–6.4% accuracy improvement on Kinetics-600. Notably, theMoViNets model family trained with our smooth regularization improves the current state-of-the-art by 3.8%–6.1% within their respective FLOP constraints, whileMobileNetV3 and the MoViNets-Stream family achieve gains of 4.9%–6.4% overprior state-of-the-art models with comparable memory footprints.

Our code andmodels are available at https://github.com/gilgoldm/grw-smoothing.