1 paper across 1 session
A new benchmark to evaluate visual caption for MLLMs with considering both correctness and thoroughness.