Intern, Institute for Computer Science, Artificial Intelligence and Technology
1 paper at NeurIPS 2025
We mix discrete and continuous adversarial attacks to adversarially train more robust LLMs. We evaluate our models in different realistic inference settings and show that they are more robust while matching the training cost of other SoTA models.