PhD student, Department of Computer Science, University of Washington
1 paper at NeurIPS 2025
We only need one example for RLVR on LLMs to achieve significant improvement on math tasks