1 paper across 1 session
We develop a variance-aware gap-dependent regret bound with better $H$ dependence for tabular MDPs.