While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a black-box setting where an adversary cannot access the target model's internal information. Most black-box attacks are based on queries, each of which obtains the target model's output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of these attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model to break this assumption, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. From this motivation, we observe even a small additive input noise can neutralize most query-based attacks and name this simple yet effective approach Small Noise Defense (SND). We analyze how SND can defend against query-based black-box attacks and demonstrate its effectiveness against eight different state-of-the-art attacks with CIFAR-10 and ImageNet datasets. Even with strong defense ability, SND almost maintains the original clean accuracy and computational speed. SND is readily applicable to pre-trained models by adding only one line of code at the inference stage, so we hope that it will be used as a baseline of defense against query-based black-box attacks in the future.
翻译:虽然深心神经网络显示不同任务中前所未有的表现,但很容易受到对抗性例子的影响,阻碍了其在安全临界系统中的部署。许多研究显示,即使在对手无法获取目标模型内部信息的黑盒子环境中,攻击也是可能的。大多数黑盒子袭击都是基于询问,每个黑盒子袭击都获得输入的目标模型输出,许多最近研究的重点是减少所需查询的数量。在本文中,我们关注对这些攻击的隐含假设,即目标模型的产出与查询输入完全吻合。如果在模型中引入某种随机性以打破这一假设,基于查询的攻击在梯度估计和地方搜索方面都可能有很大困难,而后者是其攻击过程的核心。从这一动机出发,我们甚至看到一个小的添加输入噪音可以中和大多数基于查询的攻击输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出输出,而我们分析SND如何保护这些攻击,并展示其对抗八种不同状态的攻击的有效性,使用CIFAR-10和图像网络数据设置。即使具有强大的防御能力,但基于这一动机,我们几乎可以快速进行原始的精确度的国防计算,因此,SND的编码只能在使用一个原始的基线上维持原始的精确度,我们最初的防御攻击的精确度。