1 paper across 1 session
We introduce ExAct, a benchmark for evaluating video-language models on expert-level understanding of fine-grained physical human activities across diverse real-world domains.