Full Professor, Massachusetts Institute of Technology
2 papers at NeurIPS 2025
We present MimeQA, a question-answering dataset on mime videos, to evaluate video LLMs' nonverbal social reasoning capabilities, and found that models perform below human performance.
programming framework that does runtime compilation of an LLM-based agent workflow into a search space, enabling independent experimentation of different overlaying inference-time search strategies