小小输入噪音足以防御基于询问的黑箱攻击 (Small Input Noise is Enough to Defend Against Query-based Black-box Attacks)

While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a black-box setting where an adversary cannot access the target model's internal information. Most black-box attacks are based on queries, each of which obtains the target model's output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of these attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model to break this assumption, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. From this motivation, we observe even a small additive input noise can neutralize most query-based attacks and name this simple yet effective approach Small Noise Defense (SND). We analyze how SND can defend against query-based black-box attacks and demonstrate its effectiveness against eight different state-of-the-art attacks with CIFAR-10 and ImageNet datasets. Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed. SND is readily applicable to pre-trained models by adding only one line of code at the inference stage, so we hope that it will be used as a baseline of defense against query-based black-box attacks in the future.

翻译：虽然深心神经网络显示不同任务中前所未有的表现,但很容易受到对抗性例子的影响,阻碍了其在安全临界系统中的部署。许多研究显示,即使在对手无法获取目标模型内部信息的黑盒子环境中,攻击也是可能的。大多数黑盒子袭击都是基于询问,每个黑盒子袭击都获得输入的目标模型输出,许多最近研究的重点是减少所需查询的数量。在本文中,我们关注对这些攻击的隐含假设,即目标模型的产出与查询输入完全吻合。如果在模型中引入某种随机性以打破这一假设,基于查询的攻击在梯度估计和地方搜索方面都可能有很大困难,而后者是其攻击过程的核心。从这一动机出发,我们甚至看到一个小的添加输入噪音可以中和大多数基于查询的攻击输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出,而我们分析SND如何保护这些攻击,并展示其对抗八种不同状态的攻击的有效性,使用CIFAR-10和图像网络数据设置。即使具有强大的防御能力,但基于这一动机,我们几乎可以快速进行原始的精确度的国防计算,因此,SND的编码只能在使用一个原始的基线上维持原始的精确度,我们最初的防御攻击的精确度。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

不可错过！UIUC最新《对抗机器学习》课程，附PPT

专知会员服务

35+阅读 · 2020年12月28日

【ICML2020】对抗的非负矩阵分解

专知会员服务

30+阅读 · 2020年7月31日