1 paper across 1 session
This work integrates the robot's perception of external forces as a novel modality into Vision-Language-Action models by leveraging a multimodal Mixture-of-Experts (MoE) architecture to capture subtle dynamic changes during interaction processes.