反对球体 (Adversarial Spheres)

State of the art computer vision models have been shown to be vulnerable to small adversarial perturbations of the input. In other words, most images in the data distribution are both correctly classified by the model and are very close to a visually similar misclassified image. Despite substantial research interest, the cause of the phenomenon is still poorly understood and remains unsolved. We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold. As a first step towards exploring this hypothesis, we study a simple synthetic dataset of classifying between two concentric high dimensional spheres. For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size $O(1/\sqrt{d})$. Surprisingly, when we train several different architectures on this dataset, all of their error sets naturally approach this theoretical bound. As a result of the theory, the vulnerability of neural networks to small adversarial perturbations is a logical consequence of the amount of test error observed. We hope that our theoretical analysis of this very simple case will point the way forward to explore how the geometry of complex real-world data sets leads to adversarial examples.

翻译：电动计算机视觉模型的状态已经证明很容易受到输入的小规模对立干扰。换句话说, 数据分布中的大多数图像都是由模型正确分类的, 并且非常接近于视觉上相似的错误分类图像。尽管研究兴趣很大, 现象的原因仍然没有得到很好理解, 并且仍未解析。我们假设这种反直觉行为是数据多维几何制的自然结果。作为探索这一假设的第一步, 我们研究一个简单的合成数据集, 将两个同心高度维度的域进行分类。对于这个数据集, 我们显示了测试错误的数量和接近近差的平均距离之间的基本权衡。特别是, 我们证明任何将某一域的少量恒定部分分类错误的任何模型都将很容易受到 $O (1/\ sqrt{d}) 的对立干扰。令人惊讶的是, 当我们在这个数据集上培训几个不同的复杂结构时, 它们的错误都设置自然地将这个理论捆绑起来。由于理论的缘故, 我们所观察到的神经网络的弱点将使得我们这个逻辑模型的逻辑分析结果会有多大。