MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
#4610 · Yang Shi, Huanqian Wang, Xie, Huanyao Zhang, Lijie Zhao, yifan zhang, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yushuo Guan, Zhang Zhang, Liang Wang, Haoxuan Li, Zhouchen Lin, Yuanxing Zhang, Pengfei Wan, Haotian Wang, Wenjing Yang
A novel, bilingual, and comprehensive benchmark designed to assess MLLMs’ OCR-based capabilities in video scenarios.