Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework
The article discusses a new framework called ConceptAgent designed to address limitations in current concept erasure methods used in diffusion models. It highlights how existing approaches often fail to completely eliminate target concepts, leaving models vulnerable to attacks. The proposed framework operates under black-box conditions, allowing for the awakening of erased concepts without needing access to model internals.
- ▪Diffusion models are commonly used for text-to-image generation but raise concerns about unsafe content.
- ▪Concept erasure methods often suppress rather than eliminate target concepts, leading to vulnerabilities.
- ▪ConceptAgent is a training-free, black-box framework that enables the awakening of erased concepts.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.18150 (cs) [Submitted on 18 May 2026] Title:Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework Authors:Mengyu Sun, Ziyuan Yang, Zunlong Zhou, Junxu Liu, Haibo Hu, Yi Zhang View a PDF of the paper titled Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework, by Mengyu Sun and 5 other authors View PDF Abstract:Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretrained models.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.