黑盒子零样本图像净化防御：防御黑匣子后门攻击 (Black-box Backdoor Defense via Zero-shot Image Purification)

Backdoor attacks inject poisoned data into the training set, resulting in misclassification of the poisoned samples during model inference. Defending against such attacks is challenging, especially in real-world black-box settings where only model predictions are available. In this paper, we propose a novel backdoor defense framework that can effectively defend against various attacks through zero-shot image purification (ZIP). Our proposed framework can be applied to black-box models without requiring any internal information about the poisoned model or any prior knowledge of the clean/poisoned samples. Our defense framework involves a two-step process. First, we apply a linear transformation on the poisoned image to destroy the trigger pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process using the transformed image to guide the generation of high-fidelity purified images, which can be applied in zero-shot settings. We evaluate our ZIP backdoor defense framework on multiple datasets with different kinds of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models.

翻译：后门攻击会向训练集注入有毒数据，导致模型在推理期间对有毒样本进行误分类。在黑匣子实际应用情况下，仅有模型预测值可用，因此防御此类攻击是具有挑战性的。本文提出了一种新的后门防御框架，通过零样本图像净化 (ZIP)，可以有效地防御各种攻击。我们的提议框架可以应用于黑盒子模型，而不需要任何有关受污染模型的内部信息或任何先前关于清洁/污染样本的知识。我们的防御框架包括两个步骤。首先，我们对受污染图像进行线性变换以破坏触发模式。然后，我们使用预训练的扩散模型来恢复由变换去除的缺失语义信息。特别的，我们使用转换后的图像来指导生成高保真净化图像的新逆过程，该过程可以在零样本情况下应用。我们在多个数据集上以不同种类的攻击方式评估我们的ZIP后门防御框架。实验结果表明，与现有的后门防御方法相比，我们的ZIP框架优势明显。我们相信，我们的结果将为未来黑盒子模型的防御方法提供有价值的洞见。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

CVPR2022 | 医学图像分析中基于频率注入的后门攻击

专知会员服务

4+阅读 · 2022年7月9日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【AAAI2022】自适应的随机平滑防御的鲁棒性认证方法

专知会员服务

26+阅读 · 2021年12月27日