PhD student, University of Maryland, College Park
1 paper at NeurIPS 2025
A new LLM jailbreak objective that enables more nuanced control over jailbroken responses, exploits undergeneralization of safety alignment, and improves success rates of existing jailbreaks from 14% to 80%.