PhD student, Tsinghua University
1 paper at NeurIPS 2025
This work presents the first asymptotically correct simultaneous confidence region for off-policy evaluation in reinforcement learning.