3 papers across 3 sessions
First multi-modal (vision, audio, digital context, longitudinal) scripted dataset and benchmark for goal inference for wearable assistant agents.