Poster Session 3 · Thursday, December 4, 2025 11:00 AM → 2:00 PM
#4411
When Are Concepts Erased From Diffusion Models?
Abstract
In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model.
We begin by proposing two conceptual models for the erasure mechanism in diffusion models:
- interfering with the model’s internal guidance processes, and
- reducing the unconditional likelihood of generating the target concept, potentially removing it entirely.
Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models.