2 papers across 2 sessions
Contextualizing MLLM-based agents with grounded scene graphs boosts their performance.
EgoExOR is the first dataset to combine egocentric and exocentric views in the OR, with gaze, hand pose, video, and dense scene graphs across 84k frames, enabling holistic, human-centric surgical scene understanding.