2 papers across 2 sessions
We characterize sample complexities for average-reward offline RL with function approximation for weakly communicating MDPs.