1 paper across 1 session
We propose Reference-free Preference Steering (RePS), a bidirectional preference-optimization objective that jointly does concept steering and suppression.