Full Professor, Tsinghua University
3 papers at NeurIPS 2025
We propose a lifelong safety alignment framework where a Meta-Attacker and Defender co-evolve to uncover and defend against unseen jailbreaking strategies.
We propose RPEX, an Offline-to-Online method that improves the performance of offline pretrained RL policies under a wide range of data corruptions.