Assistant Professor, ShanghaiTech University
1 paper at NeurIPS 2025
We propose Rectified Policy Optimization (RePO) to mitigate "safety compensation", which replaces the average safety metric with stricter safety constraints.