1 paper across 1 session
We provide a robust method of directly optimizing the pass at k with reinforcement learning, with theory and real world experiments.