1 paper across 1 session
Our method steers LLMs away from toxic words in real time, guiding generation toward safe alternatives using the output layer’s SVD decomposition. No retraining is needed, while fluency and context are preserved.