PhD student, Nanyang Technological University
1 paper at NeurIPS 2025
Exploiting the overfitting of LLMs, we use only ten benign QA pairs to fine-tune and jailbreak them.