PhD student, Tsinghua University, Tsinghua University
2 papers at NeurIPS 2025
This work presents the first asymptotically correct simultaneous confidence region for off-policy evaluation in reinforcement learning.