Researcher, Google
1 paper at NeurIPS 2025
We provide a robust method of directly optimizing the pass at k with reinforcement learning, with theory and real world experiments.