TrojDiff:Trojan袭击具有不同目标的传播模型</s> (TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets)

Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.

翻译：在一系列任务中,如图像合成和分子设计等,传播模型取得了巨大成功。由于这些成功取决于从不同来源收集的大规模培训数据,这些收集的数据的可信度很难控制或审计。在这项工作中,我们的目标是探索潜在培训数据操纵下传播模型的脆弱性,并试图回答:对训练有素的传播模型实施特洛伊式袭击有多难?这种特洛伊式袭击能够达到的对抗目标是什么?为了回答这些问题,我们建议对传播模型(TrojDiff)进行有效的Trojan攻击,它优化特洛伊式的传播和基因化过程。特别是,我们在特洛伊式传播过程中设计新的过渡,将对抗性目标扩散到偏颇的高山分布,并提议对Trojan式的基因化进程进行新的参数化,从而导致对受过良好训练的传播模型进行有效的培训。此外,我们考虑三种类型的对抗性攻击目标:Trojanchen式传播模型将总是属于某类来自内部攻击环境(In-D2D攻击)的Troj-Diff式袭击和基因-Dreal-Dm-deal-deal-deal-Devial-Defal-Dal-deal-Drifal-Dmmal-deal la la laview laction laction laction laction laction laction 和C-Dal-s lautal-D dal- dal-dal-demental-dal-d-d-dal-d-dal-daltraction-dal-d-dal-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-d-</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日