Deep neural networks (DNNs) for image classification are known to be vulnerable to adversarial examples. And, the adversarial examples have transferability, which means an adversarial example for a DNN model can fool another black-box model with a non-trivial probability. This gave birth of the transfer-based adversarial attack where the adversarial examples generated by a pretrained or known model (called surrogate model) are used to conduct black-box attack. There are some work on how to generate the adversarial examples from a given surrogate model to achieve better transferability. However, training a special surrogate model to generate adversarial examples with better transferability is relatively under-explored. In this paper, we propose a method of training a surrogate model with abundant dark knowledge to boost the adversarial transferability of the adversarial examples generated by the surrogate model. This trained surrogate model is named dark surrogate model (DSM), and the proposed method to train DSM consists of two key components: a teacher model extracting dark knowledge and providing soft labels, and the mixing augmentation skill which enhances the dark knowledge of training data. Extensive experiments have been conducted to show that the proposed method can substantially improve the adversarial transferability of surrogate model across different architectures of surrogate model and optimizers for generating adversarial examples. We also show that the proposed method can be applied to other scenarios of transfer-based attack that contain dark knowledge, like face verification.
翻译:用于图像分类的深心神经网络(DNNs)已知很容易受到敌对方实例的影响。而且,敌对方实例具有可转移性,这意味着DNN模型的对抗性范例可以愚弄另一个非三重概率的黑盒模型。这导致了基于转移的对抗性攻击,因为先行或已知模型(所谓的代孕模型)产生的对抗性例子被用来进行黑箱攻击。有些工作涉及如何从一个特定的替代模型中生成对抗性实例,以便实现更好的可转移性。然而,培训一个特别的替代模型,以产生具有较好可转移性的对抗性实例,这意味着DNNNM模型的对抗性范例可能会以非三重概率概率的概率来愚弄另一个黑盒式模型。在本文中,我们提出了一种方法来培训一个具有丰富深黑知识的替代性模型,用以推动由代孕模型产生的对抗性范例(DSM)的对抗性转移。我们提出的培训DSMM的方法可以包括两个关键组成部分:一个教师模型,用来提取黑暗的知识,提供软标签,以及混合的扩大模型的技能,这可以加强黑暗的核查模型,用来加强代孕育数据的最佳模式。