Deep neural networks are easily attacked by imperceptible perturbation. Presently, adversarial training (AT) is the most effective method to enhance the robustness of the model against adversarial examples. However, because adversarial training solved a min-max value problem, in comparison with natural training, the robustness and generalization are contradictory, i.e., the robustness improvement of the model will decrease the generalization of the model. To address this issue, in this paper, a new concept, namely confidence threshold (CT), is introduced and the reducing of the confidence threshold, known as confidence threshold reduction (CTR), is proven to improve both the generalization and robustness of the model. Specifically, to reduce the CT for natural training (i.e., for natural training with CTR), we propose a mask-guided divergence loss function (MDL) consisting of a cross-entropy loss term and an orthogonal term. The empirical and theoretical analysis demonstrates that the MDL loss improves the robustness and generalization of the model simultaneously for natural training. However, the model robustness improvement of natural training with CTR is not comparable to that of adversarial training. Therefore, for adversarial training, we propose a standard deviation loss function (STD), which minimizes the difference in the probabilities of the wrong categories, to reduce the CT by being integrated into the loss function of adversarial training. The empirical and theoretical analysis demonstrates that the STD based loss function can further improve the robustness of the adversarially trained model on basis of guaranteeing the changeless or slight improvement of the natural accuracy.
翻译:目前,对抗性培训(AT)是提高模型在对抗性实例方面的稳健性的最有效方法。然而,由于对抗性培训解决了最低值问题,与自然培训相比,强性和概括性是矛盾的,即模型的稳健性改进将降低模型的概括性。为了解决这一问题,本文件引入了一个新的概念,即信任阈值(CT),降低信心阈值(CTR)是提高模型的稳健性和稳健性的最佳方法。然而,由于对抗性培训解决了最低值问题,与自然培训相比,强性和概括性是矛盾的,因此,与自然培训(即自然培训相比,强性和概括性培训)相比,我们建议了蒙性差异损失函数(MDL),包括跨作物损失期和直观性术语。 经验性和理论分析表明,MDL损失可同时提高模型的稳健性和普遍性,而所谓的信任阈值降低信心阈值(CTR TR),证明该模型的稳健性改进既能提高模型的稳性,又能提高自然培训的稳健性,因此,通过对标准性培训进行不稳性培训的稳性分析,可以降低性培训的自然损失值。