Motivated by recent advancements in text-to-image diffusion, we study erasure of specific concepts from the model's weights. While Stable Diffusion has shown promise in producing explicit or realistic artwork, it has raised concerns regarding its potential for misuse. We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher. We benchmark our method against previous approaches that remove sexually explicit content and demonstrate its effectiveness, performing on par with Safe Latent Diffusion and censored training. To evaluate artistic style removal, we conduct experiments erasing five modern artists from the network and conduct a user study to assess the human perception of the removed styles. Unlike previous methods, our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time, so it cannot be circumvented even if a user has access to model weights. Our code, data, and results are available at https://erasing.baulab.info/
翻译:以最近在文字到图像传播方面的进步为动力,我们研究从模型的重量中消除具体概念。虽然稳定的传播在制作明确或现实的艺术作品方面显示出希望,但它引起了人们对其被滥用的潜在可能性的关切。我们建议了一种微调方法,这种微调方法可以将视觉概念从经过培训的传播模式中抹去,仅以这种风格的名称为单位,并使用负面指导作为教师。我们用以往的方法来衡量我们的方法,这些方法消除了明显的性内容并展示了其有效性,在与安全低端传播和受审查的培训相同的情况下进行。为了评估艺术风格的删除,我们进行了实验,将5名现代艺术家从网络中除去,并进行了用户研究,以评估人类对被删除的艺术风格的看法。与以往的方法不同,我们的方法可以永久地从传播模式中去除概念,而不是在推论时间修改输出,因此即使用户有机会使用模型重量,也无法绕过。我们的代码、数据和结果可在 https://erasing.baulab.info/上查阅。</s>