Deep neural networks are widely used in various fields because of their powerful performance. However, recent studies have shown that deep learning models are vulnerable to adversarial attacks, i.e., adding a slight perturbation to the input will make the model obtain wrong results. This is especially dangerous for some systems with high-security requirements, so this paper proposes a new defense method by using the model super-fitting state to improve the model's adversarial robustness (i.e., the accuracy under adversarial attacks). This paper mathematically proves the effectiveness of super-fitting and enables the model to reach this state quickly by minimizing unrelated category scores (MUCS). Theoretically, super-fitting can resist any existing (even future) CE-based white-box adversarial attacks. In addition, this paper uses a variety of powerful attack algorithms to evaluate the adversarial robustness of super-fitting, and the proposed method is compared with nearly 50 defense models from recent conferences. The experimental results show that the super-fitting method in this paper can make the trained model obtain the highest adversarial robustness.
翻译:然而,最近的研究表明,深心神经网络很容易受到对抗性攻击,比如,在输入中增加轻微的扰动将使模型获得错误的结果。对于某些具有高度安全要求的系统来说,这特别危险,因此本文件提出一种新的防御方法,即使用模型超级装配状态来改进模型的对抗性强力(即对抗性攻击下的准确性)。本文数学地证明了超装的有效性,并通过尽量减少不相关的类别积分使模型能够迅速到达这一状态。理论上,超级装配可以抵制任何现有的(甚至将来的)基于CE的白箱对抗性攻击。此外,本文使用各种强大的攻击算法来评价超级装配的对抗性强力,而拟议方法与最近会议近50个防御模型进行比较。实验结果表明,本文中的超装方法可以使经过训练的模型获得最高对抗性强力。