Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems

Utkarsh U. Chavan, Prashant Trivedi, Nandyala Hemachandra

Indian Institute of Technology Bombay· University of Petroleum and Energy Studies, Dehradun Uttarakhand

Lower Bounds Decentralized MARL Stochastic Shortest Path Reinforcement Learning Feature Design

Abstract

Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored.

In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies.

Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents,

n

. Our regret lower bound of

Ω (K http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice">)

, over

K

episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.