PhD student, Mila
1 paper at NeurIPS 2025
We present a theoretical framework for policy convergence in RL, which permits convergence of return distribution estimates.