3 papers across 2 sessions
We introduce a new task of recognizing chiral (temporally opposite) actions; we propose a self-supervised recipe to adapt image models to obtain compact time-sensitive video descriptors.
We propose a 3D full-body pose and cooking videos dataset along with multimodal behavior understanding benchmarks.