Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Ziye Wang, Li Kang, Yiran Qin, Jiahua Ma, zhanglin peng, LEI BAI, Ruimao Zhang

Sun Yat-sen University· University of Hong Kong· Shanghai Jiao Tong University· The Chinese University of Hong Kong, Shenzhen· Shanghai AI Laboratory

3D Gaussian Reconstruction Diffusion Policy 3D Scene Representation Robot Action Generation

⋅ NeurIPS ⋅ OpenReview

Abstract

Despite significant advances in robotic policy generation, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality.

In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP reconstructs a globally consistent 3D Gaussian field from local-view RGB images, allowing all agents to dynamically query task-relevant features from a shared scene representation. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities.

We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

Extensive ablations and visualizations further demonstrate the robustness and efficiency of our unified local-global perception framework for multi-agent embodied learning.