Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.
翻译:在一系列任务中,如图像合成和分子设计等,传播模型取得了巨大成功。由于这些成功取决于从不同来源收集的大规模培训数据,这些收集的数据的可信度很难控制或审计。在这项工作中,我们的目标是探索潜在培训数据操纵下传播模型的脆弱性,并试图回答:对训练有素的传播模型实施特洛伊式袭击有多难?这种特洛伊式袭击能够达到的对抗目标是什么?为了回答这些问题,我们建议对传播模型(TrojDiff)进行有效的Trojan攻击,它优化特洛伊式的传播和基因化过程。特别是,我们在特洛伊式传播过程中设计新的过渡,将对抗性目标扩散到偏颇的高山分布,并提议对Trojan式的基因化进程进行新的参数化,从而导致对受过良好训练的传播模型进行有效的培训。此外,我们考虑三种类型的对抗性攻击目标:Trojanchen式传播模型将总是属于某类来自内部攻击环境(In-D2D攻击)的Troj-Diff式袭击和基因-Dreal-Dm-deal-deal-deal-Devial-Defal-Dal-deal-Drifal-Dmmal-deal la la laview laction laction laction laction laction laction 和C-Dal-s lautal-D dal- dal-dal-demental-dal-d-d-dal-d-dal-daltraction-dal-d-dal-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-</s>