1 paper across 1 session
We introduce ComPABench to evaluate VLM compositional reasoning, showing that existing post-training methods struggle, while enhancing vision-text alignment and using progress rewards improves RL-based compositional ability.