3 papers across 2 sessions
We present a theoretical framework for policy convergence in RL, which permits convergence of return distribution estimates.