The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks. Adversarial training, which is the main heuristic method for improving adversarial robustness and the first line of defense against adversarial attacks, requires many sample-by-sample calculations to increase training size and is usually insufficiently strong for an entire network. This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class. From this perspective, we propose a method to generate a single but image-agnostic adversarial perturbation that carries the semantic information implying the directions to the fragile parts on the decision boundary and causes inputs to be misclassified as a specified target. We call the adversarial training based on such perturbations "region adversarial training" (RAT), which resembles classical adversarial training but is distinguished in that it reinforces the semantic information missing in the relevant regions. Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even using a very small dataset from the training data; moreover, it can defend against FGSM adversarial attacks that have a completely different pattern from the model seen during retraining.
翻译:深心神经网络(DNNS)易受对抗性攻击的伤害,这种攻击可以误导最先进的高级分类人员,通过故意干扰原始投入,高度自信地进行不正确的分类,引起人们对DNNS对此类攻击的稳健性的关切。反向培训是改善对抗性强力和对抗性攻击第一防线的主要惯性方法,它要求许多抽样逐个计算,以提高培训规模,通常对整个网络而言不够强大。本文从新角度阐述了对抗性强力问题,将整个网络的重点转移到接近某一类决策边界的区域的关键部分。从这个角度,我们提出一种方法来产生一种单一但图像不高知觉的对抗性触动性触动,以显示对决定性边界脆弱部分的方向,并导致投入被错误地分为一个特定的目标。我们称,基于这种分辨性“区域对抗性攻击性强力性强力”的新观点,即将焦点从整个网络转向整个网络,转向接近某一类相关决定性区域的关键部分。我们建议采用一种方法来产生单一但图象性的对抗性对立性触动性干扰性攻击的数据,在这种实验性训练期间可以大大地显示这种数据。