logo
today local_bar
Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#5409

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

NeurIPS Poster OpenReview

Abstract

Large Language Model (LLM)-based multi-agent systems show promise for automating real-world tasks but struggle to transfer across domains due to their domain-specific nature. Current approaches face two critical shortcomings: they require complete architectural redesign and full retraining of all components when applied to new domains.
We introduce Workforce, a hierarchical multi-agent framework that decouples strategic planning from specialized execution through a modular architecture comprising:
  1. a domain-agnostic Planner for task decomposition,
  2. a Coordinator for subtask management, and
  3. specialized Workers with domain-specific tool-calling capabilities.
This decoupling enables cross-domain transferability during both inference and training phases: During inference, Workforce seamlessly adapts to new domains by adding or modifying worker agents; For training, we introduce Optimized Workforce Learning (OWL), which improves generalization across domains by optimizing a domain-agnostic planner with reinforcement learning from real-world feedback.
To validate our approach, we evaluate Workforce on the GAIA benchmark, covering various realistic, multi-domain agentic tasks. Experimental results demonstrate Workforce achieves open-source state-of-the-art performance (69.70%), outperforming commercial systems like OpenAI's Deep Research by 2.34%. More notably, our OWL-trained 32B model achieves 52.73% accuracy (+16.37%) and demonstrates performance comparable to GPT-4o on challenging tasks.
To summarize, by enabling scalable generalization and modular domain transfer, our work establishes a foundation for the next generation of general-purpose AI assistants. Our code is available at Anonymous URL, and our data is available at Anonymous URL.
Poster