Full Professor, Center for Information and Language Processing
1 paper at NeurIPS 2025
Refusal directions in LLMs work across languages, revealing shared jailbreak mechanisms and raising the need for stronger multilingual safety.