Shaopeng Fu

Intern, Microsoft Research Asia

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence

#4904 · Shaopeng Fu, Liang Ding, Jingfeng Zhang, Di Wang

We find that it is effective to defend "long-length" jailbreak attacks via efficient "short-length" LLM adversarial training, supporting by both theoretical and empirical evidence.