3 papers across 3 sessions
We present VideoMarathon, a large-scale synthetic video instruction-following dataset for hour-long video-language understanding, and propose Hour-LLaVA, a Video-LLMs that enables efficient video-language modeling.