Researcher, Sensetime
2 papers at NeurIPS 2025
We propose Rectified Policy Optimization (RePO) to mitigate "safety compensation", which replaces the average safety metric with stricter safety constraints.