2 papers across 2 sessions
We discover that coherent value systems emerge with scale in LLMs and propose the research avenue of utility engineering to analyze and control these emergent value systems.
We present a data-centric pretraining framework that builds safety into the model from the start