2 papers across 2 sessions
In this paper, we interpret the mechanism behind safety alignment via neurons and analyze their properties.
We propose a benchmark to evaluate the large language models' instruction following ability in agentic scenarios.