4 papers across 3 sessions
LayerNavigator is a low-overhead method that scores each LLM layer's steerability to guide multi-layer activation steering, significantly outperforming baselines while offering clear interpretability.
This paper introduces Angular Steering, a robust and generalized method for fine-grained behavior control in language models, unifying and extending existing steering techniques through rotation in a feature-isolating subspace.