today local_bar

Nicholas Carlini

Researcher, Anthropic

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

IF-Guide: Influence Function-Guided Detoxification of LLMs

#1400 · Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong

We use influence functions to attribute and suppress training examples that promote toxic behaviors in LLMs.