Assistant Professor, Oregon State University
1 paper at NeurIPS 2025
We use influence functions to attribute and suppress training examples that promote toxic behaviors in LLMs.