Adversarial attacks have verified the existence of the vulnerability of neural networks. By adding small perturbations to a benign example, adversarial attacks successfully generate adversarial examples that lead misclassification of deep learning models. More importantly, an adversarial example generated from a specific model can also deceive other models without modification. We call this phenomenon ``transferability". Here, we analyze the relationship between transferability and input transformation with additive noise by mathematically proving that the modified optimization can produce more transferable adversarial examples.
翻译:对抗性攻击证实了神经网络的脆弱性。 通过给一个良性的例子添加小扰动,对抗性攻击成功地产生了导致深层学习模式分类错误的对抗性例子。 更重要的是,一个特定模式产生的对抗性例子也可以不作修改地欺骗其他模式。 我们称之为“可转移性”现象。 在这里,我们用数学来证明经过修改的优化可以产生更可转移的对抗性例子,以此分析可转移性和输入转换与添加噪音之间的关系。