Researcher, Anthropic
1 paper at NeurIPS 2025
This paper introduces ELM, a method to erase concepts from language models using the model's own knowledge classification. It applies targeted updates to reduce concept generation while preserving overall abilities and resisting attacks.