Postdoc, The Chinese University of Hong Kong
5 papers at NeurIPS 2025
We introduce a novel embodied VLM agent with a VLM fine-tuned by agentic data synthesis for open-world mobile manipulation, unifying scene understanding, state tracking, and action generation for state-of-the-art results.
A lightweight, plug-and-play mapper to boost the performance of OVSS with minimal computational overhead
We systematically investigate the design space and scaling property of native Multimodal Large Language Models and introduce a novel MLLM that achieves competitive performance against existing MLLMs.