Institute of Dataspace, Hefei

🏛 Institute of Dataspace, Hefei

1 paper across 1 session

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing

#1109 · Yisong Xiao, Aishan Liu, Siyuan Liang, Zonghao Ying, Xianglong Liu, Dacheng Tao

We propose a test-time detoxification framework that models toxicity transitions within the latent representation space to enable stable and precise representation editing guidance.