Associate Professor, The University of Hong Kong
5 papers at NeurIPS 2025
We introduce a novel embodied VLM agent with a VLM fine-tuned by agentic data synthesis for open-world mobile manipulation, unifying scene understanding, state tracking, and action generation for state-of-the-art results.
a unified multimodal model purely based on discrete flow matching, achieving comparable performance with AR-based MLLMs