1 paper across 1 session
We introduce a novel embodied VLM agent with a VLM fine-tuned by agentic data synthesis for open-world mobile manipulation, unifying scene understanding, state tracking, and action generation for state-of-the-art results.