MS student, Tsinghua University
1 paper at NeurIPS 2025
In this paper, we interpret the mechanism behind safety alignment via neurons and analyze their properties.