Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In addition, AEs have adversarial transferability, namely, AEs generated for a source model fool other (target) models. In this paper, we investigate the transferability of models encrypted for adversarially robust defense for the first time. To objectively verify the property of transferability, the robustness of models is evaluated by using a benchmark attack method, called AutoAttack. In an image-classification experiment, the use of encrypted models is confirmed not only to be robust against AEs but to also reduce the influence of AEs in terms of the transferability of models.
翻译:众所周知,深神经网络(DNNs)容易受到对抗性例子的影响。此外,AEs具有对抗性可转移性,即为源模型生成的AEs,为其他(目标)其他(目标)模型生成的源模型生成AEs。在本文中,我们首次调查为对抗性强防御加密的模型的可转移性。为了客观核实可转移性,通过使用称为AutoAttack的基准攻击方法来评估模型的稳健性。在图像分类实验中,加密模型的使用不仅被证实对AEs具有很强的抗力,而且还降低了AEs在模型可转移性方面的影响力。