1 paper across 1 session
We present VideoMarathon, a large-scale synthetic video instruction-following dataset for hour-long video-language understanding, and propose Hour-LLaVA, a Video-LLMs that enables efficient video-language modeling.