PhD student, Xiamen University
1 paper at NeurIPS 2025
This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO).