The reliability of deep learning algorithms is fundamentally challenged by the existence of adversarial examples, which are incorrectly classified inputs that are extremely close to a correctly classified input. We explore the properties of adversarial examples for deep neural networks with random weights and biases, and prove that for any $p\ge1$, the $\ell^p$ distance of any given input from the classification boundary scales as one over the square root of the dimension of the input times the $\ell^p$ norm of the input. The results are based on the recently proved equivalence between Gaussian processes and deep neural networks in the limit of infinite width of the hidden layers, and are validated with experiments on both random deep neural networks and deep neural networks trained on the MNIST and CIFAR10 datasets. The results constitute a fundamental advance in the theoretical understanding of adversarial examples, and open the way to a thorough theoretical characterization of the relation between network architecture and robustness to adversarial perturbations.
翻译:深层学习算法的可靠性受到对抗性实例的存在的根本挑战,这些例子的分类错误,极接近于正确分类的输入。我们探讨了具有随机权重和偏差的深神经网络的对抗性实例的特性,并证明对于任何美元=Ge1美元来说,从分类边界尺度中输入的任何特定投入的距离是投入值的平方值的平方值的一美元。结果基于最近证明的高斯进程和隐藏层无限宽度的深神经网络之间的等值,并通过随机深神经网络和在MNIST和CIFAR10数据集上培训的深神经网络的实验加以验证。结果构成了对对抗性实例理论理解的一个根本进步,并为彻底从理论上描述网络结构与对对抗性干扰的坚固度之间的关系开辟了道路。