3 papers across 3 sessions
We present MimeQA, a question-answering dataset on mime videos, to evaluate video LLMs' nonverbal social reasoning capabilities, and found that models perform below human performance.
We introduce TwinMarket, an LLM-based multi-agent framework that simulates socio-economic systems by modeling how individual behaviors interact to produce emergent market dynamics like bubbles and crashes.