It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$-norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited scenarios. In this paper, we seek for a new approach to develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$-distance as its basic operation (which we call $\ell_\infty$-dist neuron), and show that any neural network constructed with $\ell_\infty$-dist neurons (called $\ell_{\infty}$-dist net) is naturally a 1-Lipschitz function with respect to $\ell_\infty$-norm. This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs. We also prove that such networks have enough expressive power to approximate any 1-Lipschitz function with robust generalization guarantee. Our experimental results show that the proposed network is promising. Using $\ell_{\infty}$-dist nets as the basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09% certified accuracy on MNIST ($\epsilon=0.3$), 79.23% on Fashion MNIST ($\epsilon=0.1$) and 35.10% on CIFAR-10 ($\epsilon=8/255$).
翻译:众所周知,标准的神经网络,即使分类精确度很高,也很容易受到小的 $\ ell\ incty $norfty $untremed 对抗性扰动。虽然已经做了许多尝试,但大多数先前的工程只能对特定攻击方法的防御进行实证性核查,或者只能在有限的情景中为模型的稳健性提供经认证的保证。在本文中,我们寻求一种新的方法来开发一个理论性原则性神经网络,这种网络本质上抵制$@ell\ infty$ perfty perturbation。特别是,我们设计了一个新颖的神经网络,用$@infty$-normed community $@infty$@infyfty$treater 来作为其验证性操作(我们称之为$@inftyfty$-dist neurnn) 的正确性能。我们用了一个可靠的网络功能也证明这样的网络,即我们使用最可靠的常规数据。