Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models. Our code is available on https://github.com/IBM/BadDiffusion.
翻译:Abstract:
扰动模型是基于学习正向和反向扩散过程的原理,通过逐步添加噪声和去噪而训练的最先进的深度学习强化生成模型。为了更好地了解其局限性和潜在风险,本文首次对扰动攻击下的扩散模型的鲁棒性进行了研究。具体地,我们提出 BadDiffusion,一种新型攻击框架,在模型训练过程中工程化受损扩散过程以进行后门植入。在推断阶段,带后门的扩散模型对于常规数据输入将表现得像一个未篡改的生成器,但在接收到植入的触发信号时将错误地生成一些受坏意操作者设计的目标输出。这种重大风险可能对建立在有问题的模型之上的下游任务和应用程序构成可怕的影响。我们在各种后门攻击设置下进行了广泛的实验,结果表明 BadDiffusion 可以始终导致具有高效用性和目标特异性的受损扩散模型。更糟糕的是,BadDiffusion 可以通过简单地对干净的预训练扩散模型进行微调来实现成本效益。我们还探讨了一些可能降低风险的对策。我们的结果引起了对于扩散模型的潜在风险和可能的误用的关注。我们的代码可在 https://github.com/IBM/BadDiffusion 上获得。