Assistant Professor, Hong Kong University of Science and Technology
2 papers at NeurIPS 2025
We develop a variance-aware gap-dependent regret bound with better $H$ dependence for tabular MDPs.
We provide a computational efficient algorithm to achieve $O(H)$ deployment cost with polynomial sample complexity.