1 paper across 1 session
We present the first comprehensive benchmark for long-context vision langauge models.