Principal Researcher, PBVR, Beijing University of Aeronautics and Astronautics
2 papers at NeurIPS 2025
We propose Rectified Policy Optimization (RePO) to mitigate "safety compensation", which replaces the average safety metric with stricter safety constraints.