Poster Session 4 · Thursday, December 4, 2025 4:30 PM → 7:30 PM
#3209
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
Abstract
This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) under general parametrized policies with smooth and bounded policy gradients.
We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate.
In particular, our algorithm achieves global convergence and constraint violation rates of over a horizon of length when the mixing time, , is known to the learner. In absence of knowledge of , the achievable rates change to provided that .
Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.