1 paper across 1 session
We introduce a new task of recognizing chiral (temporally opposite) actions; we propose a self-supervised recipe to adapt image models to obtain compact time-sensitive video descriptors.