1 paper across 1 session
STSBench is a benchmark that evaluates the capabilities of Multi-modal Large Language Models to reason about spatio-temporal actions.