In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the $\ell_0$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. To this end, we propose a novel defense method that consists of "truncation" and "adversarial training". We then theoretically study the Gaussian mixture setting and prove the asymptotic optimality of our proposed classifier. Motivated by the insights we obtain, we extend these components to neural network classifiers. We conduct numerical experiments in the domain of computer vision using the MNIST and CIFAR datasets, demonstrating significant improvement for the robust classification error of neural networks.
翻译:在过去20年中,我们看到神经网络随着其分类准确性而越来越受欢迎。 与此同时,我们也看到同样的预测模型是多么脆弱:输入的微小扰动在整个数据集中可能造成分类错误。在本文中,我们认为受$_0$-norm 约束的扰动,这在图像识别、自然语言处理和恶意软件检测等领域被证明是有效的攻击。为此,我们提出了由“运行”和“对抗训练”组成的新型防御方法。然后我们从理论上研究高斯混合物设置,并证明我们提议的分类师的无症状最佳性。根据我们获得的洞察力,我们将这些部件扩大到神经网络分类师。我们利用MNIST和CIFAR数据集在计算机视觉领域进行了数字实验,表明神经网络的稳健的分类错误有了显著改善。