Full Professor, East China Normal University
2 papers at NeurIPS 2025
This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO).