Recent improvements in deep learning models and their practical applications have raised concerns about the robustness of these models against adversarial examples. Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training. However, it usually fails against other attacks, i.e. the model overfits to the training attack scheme. In this paper, we propose a simple modification to the AT that mitigates the mentioned issue. More specifically, we minimize the perturbation $\ell_p$ norm while maximizing the classification loss in the Lagrangian form. We argue that crafting adversarial examples based on this scheme results in enhanced attack generalization in the learned model. We compare our final model robust accuracy against attacks that were not used during training to closely related state-of-the-art AT methods. This comparison demonstrates that our average robust accuracy against unseen attacks is 5.9% higher in the CIFAR-10 dataset and is 3.2% higher in the ImageNet-100 dataset than corresponding state-of-the-art methods. We also demonstrate that our attack is faster than other attack schemes that are designed for unseen attack generalization, and conclude that it is feasible for large-scale datasets.
翻译:最近深层次学习模式及其实际应用的改进引起了人们对这些模式在对抗性实例方面的稳健性的关切。反向培训(AT)已证明有效,可以形成一个对付培训期间使用的攻击的稳健模式。然而,它通常无法对付其他攻击,即培训攻击计划所需的模型。在本文中,我们建议对AT进行简单的修改,以缓解上述问题。更具体地说,我们尽可能减少在拉格朗江形式的分类损失,同时最大限度地减少扰动$\ell_p$的规范。我们争辩说,根据这一计划拟订的对抗性实例,可以提高所学模型中攻击的常识化。我们比较了在培训期间没有使用的最后模型对攻击的稳健的准确性,与最先进的AT方法密切相关。这一比较表明,我们在CIFAR-10数据集中对隐形攻击的平均稳健的准确性比CFAR-10数据集高出5.9%,在图像网-100数据集中比相应的最新方法高出3.2%。我们还表明,我们的攻击比为隐形攻击所设计的其他攻击计划要快得多。我们的结论是,大规模数据是可行的。