Researcher, Amazon
1 paper at NeurIPS 2025
STSBench is a benchmark that evaluates the capabilities of Multi-modal Large Language Models to reason about spatio-temporal actions.