2 papers across 2 sessions
We contribute provable guarantees that regularized policy gradient methods converge in approximate Nash equilibria in imperfect-information extensive-form zero-sum games.
We introduce URB, a standardized benchmark for evaluating MARL in urban routing with autonomous vehicles across 29 real-world traffic networks, revealing that current state-of-the-art algorithms struggle to outperform humans and scale effectively.