3 papers across 2 sessions
We provide O(\epsilon^{-4}) iteration complexity policy optimization algorithm for robust constrained Markov Decision Processing
We contribute provable guarantees that regularized policy gradient methods converge in approximate Nash equilibria in imperfect-information extensive-form zero-sum games.
We establish the first set of sample complexity bounds for private policy optimization