Poster Session 4 · Thursday, December 4, 2025 4:30 PM → 7:30 PM
#402
Estimation and Inference in Distributional Reinforcement Learning
Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
⋅ NeurIPS
Abstract
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency.
We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted ) attained by a given policy . We use the certainty-equivalence method to construct our estimator , based on a generative model. In this circumstance we need a dataset of size to guarantee the supremum p-Wasserstein metric between and less than with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency.
Also, we show that under different mild assumptions a dataset of size suffices to ensure the supremum Kolmogorov-Smirnov metric and supremum total variation metric between and is below with high probability.
Furthermore, we investigate the asymptotic behavior of . We demonstrate that the "empirical process" converges weakly to a Gaussian process in the space of bounded functionals on a Lipschitz function class , also in the space of bounded functionals on an indicator function class and a bounded measurable function class when some mild conditions hold.