Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.
翻译:干净图像后门攻击仅通过操纵训练数据集中的标签来破坏深度神经网络,对安全关键应用构成重大威胁。现有方法的一个关键缺陷在于,成功攻击所需的投毒率会导致清洁准确率(CA)成比例且显著下降,从而削弱其隐蔽性。本文提出了一种新的干净图像攻击范式,通过优化触发器本身来最小化这种准确率下降。我们引入了生成式干净图像后门(GCB)框架,该框架利用条件式InfoGAN识别自然存在的图像特征作为高效且隐蔽的触发器。通过确保这些触发器与良性任务相关特征易于分离,GCB使受害模型能够从极少量投毒样本中学习后门,实现CA下降小于1%。实验表明,GCB具有显著的泛化能力,成功适应六个数据集、五种架构和四项任务,包括首次在回归与分割任务中实现干净图像后门攻击。GCB还展现出对现有大多数后门防御方法的抵抗能力。