As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example's robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defence against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.
翻译:作为一种新的编程模式,深层次的学习将其应用扩大到许多现实世界问题。与此同时,深层次的学习软件被发现容易受到对抗性攻击。虽然提出了各种防御机制来提高深层次学习软件的稳健性,但其中许多机制对适应性攻击是无效的。在这项工作中,我们提出一种新的特征描述,将对抗性例子与良性例子区分开来,其依据的观察是,对抗性例子明显不如良性例子。由于现有的强力衡量方法没有扩大到大型网络,因此我们提议了一个新的防御框架,称为攻击防御(A2D),以便通过有效评价一个实例的稳健性来发现对抗性例子。A2D利用了攻击投入进行稳健性评价的成本,并将那些较弱的强性例子确定为对抗性例子,因为较不那么强,攻击容易攻击。关于MNIST、CIFAR10和图像网的广泛实验结果显示,A2D比最近有希望的方法更有效。我们还评估我们针对潜在适应性攻击的防御能力,并表明A2D在捍卫精心设计的适应性攻击方面是有效的,例如攻击成功率在CIFAR10上下降到0。