3 papers across 3 sessions
We present MimeQA, a question-answering dataset on mime videos, to evaluate video LLMs' nonverbal social reasoning capabilities, and found that models perform below human performance.
Robo2VLM is a framework that generates VQA datasets for robotic manipulation using real-world robot trajectories and non-visual sensor data
LibriBrain is the largest non-invasive MEG dataset (over 50 hours) recorded from a single subject listening to naturalistic speech, designed to advance scalable and reproducible machine learning methods for speech decoding from brain activity.