4 papers across 3 sessions
This work presents the first asymptotically correct simultaneous confidence region for off-policy evaluation in reinforcement learning.