4 papers across 3 sessions
We propose the first provably efficient and episode-wise safe RL algorithm for linear constrained MDPs.
This paper establishes the mathematical foundation of value decomposition in MARL.
We characterize sample complexities for average-reward offline RL with function approximation for weakly communicating MDPs.