3 papers across 2 sessions
We discover that coherent value systems emerge with scale in LLMs and propose the research avenue of utility engineering to analyze and control these emergent value systems.
This paper introduces Angular Steering, a robust and generalized method for fine-grained behavior control in language models, unifying and extending existing steering techniques through rotation in a feature-isolating subspace.
We propose, analyze, and validate a method for guiding LLM behavior at inference time by applying steering vectors to query and value representations.