Given the great threat of adversarial attacks against Deep Neural Networks (DNNs), numerous works have been proposed to boost transferability to attack real-world applications. However, existing attacks often utilize advanced gradient calculation or input transformation but ignore the white-box model. Inspired by the fact that DNNs are over-parameterized for superior performance, we propose diversifying the high-level features (DHF) for more transferable adversarial examples. In particular, DHF perturbs the high-level features by randomly transforming the high-level features and mixing them with the feature of benign samples when calculating the gradient at each iteration. Due to the redundancy of parameters, such transformation does not affect the classification performance but helps identify the invariant features across different models, leading to much better transferability. Empirical evaluations on ImageNet dataset show that DHF could effectively improve the transferability of existing momentum-based attacks. Incorporated into the input transformation-based attacks, DHF generates more transferable adversarial examples and outperforms the baselines with a clear margin when attacking several defense models, showing its generalization to various attacks and high effectiveness for boosting transferability.
翻译:针对深度神经网络(DNN)受到对抗攻击的威胁日益严峻的情况,已经有许多工作提出以提高生成对抗性样本的可转移性。然而,现有攻击方法通常利用先进的梯度计算或输入转换技术,而忽略了白盒模型。受到 DNN 过度参数化以获得出色性能的启发,我们提出了多样化高级特征(DHF)以提高生成对抗性样本的可转移性。特别是,DHF 通过在每次迭代时随机转换高级特征并将其与良性样本的特征混合来扰动高级特征,从而计算生成对抗性样本的梯度。由于参数冗余性,这种转换不会影响分类性能,但有助于识别不同模型之间的不变特征,从而显著提高了可转移性。对 ImageNet 数据集的实证评估表明,DHF 可以有效地提高现有基于动量的攻击方法的可转移性。当结合输入转化型攻击时,DHF 生成更具可转移性的对抗性样本,在攻击几个防御模型时优于基准,展示了其适用于各种攻击和提高转移性的高效性。