Full Professor, University of Washington, Seattle
1 paper at NeurIPS 2025
We develop a variance-aware gap-dependent regret bound with better $H$ dependence for tabular MDPs.