1 paper across 1 session
A novel vision-language pretraining method that explores ordering and continuity of videos for robot manipulation