1 paper across 1 session
This work presents the first asymptotically correct simultaneous confidence region for off-policy evaluation in reinforcement learning.