Since the discovery of adversarial examples - the ability to fool modern CNN classifiers with tiny perturbations of the input, there has been much discussion whether they are a "bug" that is specific to current neural architectures and training methods or an inevitable "feature" of high dimensional geometry. In this paper, we argue for examining adversarial examples from the perspective of Bayes-Optimal classification. We construct realistic image datasets for which the Bayes-Optimal classifier can be efficiently computed and derive analytic conditions on the distributions under which these classifiers are provably robust against any adversarial attack even in high dimensions. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier, indicating that adversarial examples are often an avoidable "bug". We further show that RBF SVMs trained on the same data consistently learn a robust classifier. The same trend is observed in experiments with real images in different datasets.
翻译:自发现对抗性实例以来,即能够以微小的扰动干扰输入来愚弄现代CNN分类器,人们一直在讨论它们是否是当前神经结构和培训方法所特有的“错误”或高维几何学不可避免的“特性”。在本文中,我们主张从Bayes-Optimal分类的角度来审查对抗性实例。我们构建现实的图像数据集,使Bayes-Optimal分类器能够高效地计算,并对这些分类器的分布进行分析,在这些分布中,这些分类器即使在高维度的情况下,也能够对任何对抗性攻击具有可辨称的强度。我们的结果表明,即使这些“黄金标准”最佳分类器是强大的,但接受过相同数据集培训的CNN也始终学习一个脆弱的分类器,表明对抗性实例往往是可以避免的“错误 ” 。我们进一步表明,在相同数据上接受培训的RBFFSVMSMs始终学习一个强有力的分类器。在不同数据集中真实图像的实验中也观察到了同样的趋势。