Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.
翻译:大规模的文本到图像扩散模型可以生成具有强大合成能力的高保真图像。然而,这些模型通常是在大量互联网数据上进行训练的,这些数据往往包含版权材料、授权图片和个人照片。此外,已经发现这些模型可以复制不同艺术家的风格或者记忆训练样本。那我们该如何在不重新训练模型的情况下删除此类受版权保护的概念或图像呢?为实现这一目标,我们提出了一种高效的预训练模型中概念消除的方法,即阻止目标概念的生成。我们的算法学习将与取消目标样式、实例或文本提示所对应的跟踪模型的分布相匹配的锚定概念分布,从而防止模型根据其文本条件生成目标概念。大量实验证明,我们的方法可以成功防止所消除的概念的生成,同时保留模型中与其密切相关的其他概念。