2 papers across 2 sessions
We contribute provable guarantees that regularized policy gradient methods converge in approximate Nash equilibria in imperfect-information extensive-form zero-sum games.
We present the first parameter-free last-iterate convergence of Counterfactual Regret Minimization algorithms.