Deep neural networks are known to be extremely vulnerable to adversarial examples under white-box setting. Moreover, the malicious adversaries crafted on the surrogate (source) model often exhibit black-box transferability on other models with the same learning task but having different architectures. Recently, various methods are proposed to boost the adversarial transferability, among which the input transformation is one of the most effective approaches. We investigate in this direction and observe that existing transformations are all applied on a single image, which might limit the adversarial transferability. To this end, we propose a new input transformation based attack method called Admix that considers the input image and a set of images randomly sampled from other categories. Instead of directly calculating the gradient on the original input, Admix calculates the gradient on the input image admixed with a small portion of each add-in image while using the original label of the input to craft more transferable adversaries. Empirical evaluations on standard ImageNet dataset demonstrate that Admix could achieve significantly better transferability than existing input transformation methods under both single model setting and ensemble-model setting. By incorporating with existing input transformations, our method could further improve the transferability and outperforms the state-of-the-art combination of input transformations by a clear margin when attacking nine advanced defense models under ensemble-model setting. Code is available at https://github.com/JHL-HUST/Admix.
翻译:深心神经网络据知极易受到白箱设置下的对抗性例子。 此外,在替代(源)模型上设计的恶意对手往往在具有相同学习任务但结构不同的其他模型上展示黑箱可转移性。 最近,提出了各种方法来提高对抗性可转移性,其中输入转换是最有效的方法之一。 我们从这个方向进行调查, 并观察现有的转换都应用在单一图像上, 这可能会限制对抗性可转移性。 为此, 我们提出一个新的基于攻击方法( Admix ) 的新输入转换, 即考虑输入图像和其他类别随机抽样的一组图像。 不直接计算原始输入的梯度, Admix 计算输入图像的梯度, 与每种添加图像的一小部分相混合。 我们使用输入的原始标签来制造更多可转移的对手。 对标准图像网数据集的“ 经验评估” 显示, Admix 与现有的输入转换方法相比, 在单一模型设置下, 和多位模型设置下, 都能够大大改进现有的输入性转换方法。 当现有的输入性变制时, 将现有的国防模型纳入明确的防御模型, 我们的模型, 将改进了现有的递制 。