Poster Session 2 · Wednesday, December 3, 2025 4:30 PM → 7:30 PM
#5201
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi LI, Ge Zhang, Yinghao Ma, Ruibin Yuan, King Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Moore Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Yidan WEN, Yanghai Wang, Shihao Li, Zhaoxiang Zhang, Ruibo Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin
Abstract
Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored. We introduce OmniBench, a novel benchmark designed to evaluate models’ ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as omni-language models (OLMs).
OmniBench features high-quality human annotations that require integrated understanding across all modalities. Our evaluation reveals that:
- open-source OLMs show significant limitations in instruction-following and reasoning in tri-modal contexts; and
- most baseline models perform poorly (below 50% accuracy) even with textual alternatives to image/audio inputs.
To address these limitations, we develop OmniInstruct, an 96K-sample instruction tuning dataset for training OLMs. We advocate for developing more robust tri-modal integration techniques and training strategies to enhance OLM performance. Codes and data could be found at https://m-a-p.ai/OmniBench/.