LLM Jailbreaks - NeurIPS 2025

today local_bar

LLM Jailbreaks

1 paper across 1 session

Poster Session 5

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Refusal Direction is Universal Across Safety-Aligned Languages

#1114 · Xinpeng Wang, Mingyang Wang, Yihong Liu, Hinrich Schuetze, Barbara Plank

Refusal directions in LLMs work across languages, revealing shared jailbreak mechanisms and raising the need for stronger multilingual safety.