Deep neural networks have been successfully applied in various machine learning tasks. However, studies show that neural networks are susceptible to adversarial attacks. This exposes a potential threat to neural network-based intelligent systems. We observe that the probability of the correct result outputted by the neural network increases by applying small first-order perturbations generated for non-predicted class labels to adversarial examples. Based on this observation, we propose a method for counteracting adversarial perturbations to improve adversarial robustness. In the proposed method, we randomly select a number of class labels and generate small first-order perturbations for these selected labels. The generated perturbations are added together and then clamped onto a specified space. The obtained perturbation is finally added to the adversarial example to counteract the adversarial perturbation contained in the example. The proposed method is applied at inference time and does not require retraining or finetuning the model. We experimentally validate the proposed method on CIFAR-10 and CIFAR-100. The results demonstrate that our method effectively improves the defense performance of several transformation-based defense methods, especially against strong adversarial examples generated using more iterations.
翻译:在各种机器学习任务中成功地应用了深心神经网络。然而,研究表明,神经网络很容易受到对抗性攻击。这暴露了神经网络智能系统的潜在威胁。我们观察到,神经网络通过对对抗性例子应用非预测性等级标签产生的小先令扰动,提高了神经网络产生正确结果的概率。根据这一观察,我们提出了一种方法,用以对抗对抗对立性干扰,以提高对立性强力。在拟议方法中,我们随机选择了若干类标签,为这些选定的标签制造了小的一阶扰动。产生的扰动被加在一起,然后被压到一个指定空间。最后,将获得的扰动性添加到对抗该例子中所含的对立性振动性标签的对立性实例中。拟议方法是在推论时间应用的,不需要再培训或微调模型。我们实验性地验证了在CIFAR-10和CIFAR-100上的拟议方法。结果表明,我们的方法有效地改进了若干基于更强的防御性防御性模型的防御性表现,特别是用较强的方法。