4 papers across 2 sessions
Our method steers LLMs away from toxic words in real time, guiding generation toward safe alternatives using the output layer’s SVD decomposition. No retraining is needed, while fluency and context are preserved.
We introduce FEEL, a benchmarking study evaluating 19 emotion datasets based on physiological signals, uncovering key insights into their generalizability and cross-dataset transferability.
Steering given distributions towards ideal distributions, where fairness and accuracy are not at a tradeoff.