PhD student, University of Washington
1 paper at NeurIPS 2025
We develop a variance-aware gap-dependent regret bound with better $H$ dependence for tabular MDPs.