PhD student, Ludwig-Maximilians-Universität München
1 paper at NeurIPS 2025
Refusal directions in LLMs work across languages, revealing shared jailbreak mechanisms and raising the need for stronger multilingual safety.