Representation Engineering

3 papers across 2 sessions

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

#1412 Spotlight · Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W, Lee, Richard Ren, Long Phan, Norman Mu, Oliver Zhang, Dan Hendrycks

We discover that coherent value systems emerge with scale in LLMs and propose the research avenue of utility engineering to analyze and control these emergent value systems.

Angular Steering: Behavior Control via Rotation in Activation Space

#1105 Spotlight · Minh Hieu Vu, Tan Nguyen

This paper introduces Angular Steering, a robust and generalized method for fine-grained behavior control in language models, unifying and extending existing steering techniques through rotation in a feature-isolating subspace.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

DISCO: Disentangled Communication Steering for Large Language Models

#4001 · Max Torop, Aria Masoomi, Masih Eskandar, Jennifer Dy

We propose, analyze, and validate a method for guiding LLM behavior at inference time by applying steering vectors to query and value representations.