1 paper across 1 session
First multi-modal (vision, audio, digital context, longitudinal) scripted dataset and benchmark for goal inference for wearable assistant agents.