PhD student, LMU Munich
1 paper at NeurIPS 2025
Refusal directions in LLMs work across languages, revealing shared jailbreak mechanisms and raising the need for stronger multilingual safety.