MS student, Tsinghua University
1 paper at NeurIPS 2025
We propose a lifelong safety alignment framework where a Meta-Attacker and Defender co-evolve to uncover and defend against unseen jailbreaking strategies.