1 paper across 1 session
In this paper, we interpret the mechanism behind safety alignment via neurons and analyze their properties.