logo
today local_bar
Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#3314

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

NeurIPS OpenReview

Abstract

We consider gap-dependent regret bounds for episodic MDPs.
We show that the Monotonic Value Propagation (MVP) algorithm (Zhang et al. 2024) achieves a variance-aware gap-dependent regret bound of $+SAH4(SH))logK),HSAK\tilde{O}\mathsf{poly} \log (S, A, H, 1 / \Delta\_{\mathrm{min}}, 1 / \delta)\Deltah(s,a) =Vh^ (a) - Qh^ (s, a)\Delta{\mathrm{min}} := \min{\Deltah (s,a) > 0} \Deltah(s,a)\mathtt{Var}\_{\max}^{\textup{c}}(\pi, h, s)\pish\mathtt{Var}\_{\max}^{\textup{c}}(h, s)$ pair.
Our result stems from a novel analysis of the weighted sum of the suboptimality gap and can be potentially adapted for other algorithms.
To complement the study, we establish a lower bound of $\mathtt{Var}\_{\max}^{\textup{c}}(h, s)$) approaches zero.