Though deep neural networks have achieved state-of-the-art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. Small and often imperceptible perturbations to the input images are sufficient to fool the most powerful deep neural networks. Various defense methods have been proposed to address this issue. However, they either require knowledge on the process of generating adversarial examples, or are not robust against new attacks specifically designed to penetrate the existing defense. In this work, we introduce key-based network, a new detection-based defense mechanism to distinguish adversarial examples from normal ones based on error correcting output codes, using the binary code vectors produced by multiple binary classifiers applied to randomly chosen label-sets as signatures to match normal images and reject adversarial examples. In contrast to existing defense methods, the proposed method does not require knowledge of the process for generating adversarial examples and can be applied to defend against different types of attacks. For the practical black-box and gray-box scenarios, where the attacker does not know the encoding scheme, we show empirically that key-based network can effectively detect adversarial examples generated by several state-of-the-art attacks.
翻译:尽管深心神经网络在视觉分类方面达到了最先进的表现,但最近的研究表明,它们都很容易受到对抗性实例攻击,对输入图像的微小而且往往无法察觉的扰动足以愚弄最强大的深心神经网络。提出了各种防御方法来解决这一问题。但是,它们要么需要了解产生对抗性实例的过程,要么对专门为渗透现有防御而设计的新攻击没有很强的力度。在这项工作中,我们引入了以关键为基础的网络,一种新的以探测为基础的防御机制,将对抗性实例与基于错误校正输出代码的正常例子区分开来,使用多个二进制分类师生成的二进制代码矢量作为随机选择标签设置的签名,以匹配普通图像和拒绝对抗性实例。与现有的防御方法不同,拟议方法并不需要了解产生对抗性实例的过程,而是可以用来防御不同类型的攻击。对于实用的黑箱和灰箱情景,攻击者并不了解编码计划,我们从经验上表明,基于关键网络可以有效地检测由几个州产生的敌对性攻击案例。