机器学习中反对立实例的软件式模型 (The Dimpled Manifold Model of Adversarial Examples in Machine Learning)

The extreme fragility of deep neural networks, when presented with tiny perturbations in their inputs, was independently discovered by several research groups in 2013. However, despite enormous effort, these adversarial examples remained a counterintuitive phenomenon with no simple testable explanation. In this paper, we introduce a new conceptual framework for how the decision boundary between classes evolves during training, which we call the {\em Dimpled Manifold Model}. In particular, we demonstrate that training is divided into two distinct phases. The first phase is a (typically fast) clinging process in which the initially randomly oriented decision boundary gets very close to the low dimensional image manifold, which contains all the training examples. Next, there is a (typically slow) dimpling phase which creates shallow bulges in the decision boundary that move it to the correct side of the training examples. This framework provides a simple explanation for why adversarial examples exist, why their perturbations have such tiny norms, and why they look like random noise rather than like the target class. This explanation is also used to show that a network that was adversarially trained with incorrectly labeled images might still correctly classify most test images, and to show that the main effect of adversarial training is just to deepen the generated dimples in the decision boundary. Finally, we discuss and demonstrate the very different properties of on-manifold and off-manifold adversarial perturbations. We describe the results of numerous experiments which strongly support this new model, using both low dimensional synthetic datasets and high dimensional natural datasets.

翻译：2013年,一些研究团体独立发现了深神经网络的极端脆弱性,这些深度神经网络在投入中受到微小的扰动,这是几个研究团体在2013年独立发现的。然而,尽管做了巨大的努力,这些对抗性实例仍然是一个反直觉现象,没有简单的测试解释。在本文件中,我们引入了一个新的概念框架,说明在培训期间,各班的决定界限是如何演变的,我们称之为“浸泡式曼尼化模型 ” 。特别是,我们证明,培训分为两个不同的阶段。第一阶段是一个(通常快速的)紧凑过程,最初随机偏向的决定边界非常接近低维度图像,其中包括所有培训范例。接下来,有一个(通常缓慢的)模糊阶段,在决定界限中造成浅的悬浮。这个框架简单解释了为什么存在对抗性例子,为什么它们的扰动模式如此小,为什么它们看起来像随机噪音而不是目标类。这个解释还用来显示一个网络,在对低维度图像进行激烈的对立性支持,它包含了所有培训的例子。下一步,(通常缓慢的)是,在决定边界界限上制造的浅度方面,我们最终可以正确地对等的图像的对等,我们用错误的对等的对等的对等的对等结果的对等结果的对等结果,我们可以正确展示了。