1 paper across 1 session
We present a theoretical framework for policy convergence in RL, which permits convergence of return distribution estimates.