1 paper across 1 session
We propose a test-time detoxification framework that models toxicity transitions within the latent representation space to enable stable and precise representation editing guidance.