PhD student, Tsinghua University, Tsinghua University
2 papers at NeurIPS 2025
In this paper, we interpret the mechanism behind safety alignment via neurons and analyze their properties.