LLM toxicity - NeurIPS 2025

today local_bar

LLM toxicity

1 paper across 1 session

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

IF-Guide: Influence Function-Guided Detoxification of LLMs

#1400 · Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong

We use influence functions to attribute and suppress training examples that promote toxic behaviors in LLMs.