Malicious attackers can generate targeted adversarial examples by imposing human-imperceptible noise on images, forcing neural network models to produce specific incorrect outputs. With cross-model transferable adversarial examples, the vulnerability of neural networks remains even if the model information is kept secret from the attacker. Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples. However, existing methods fall short under the more challenging scenario of creating targeted attacks transferable among distinct models. In this work, we propose Diversified Weight Pruning (DWP) to further enhance the ensemble-based methods by leveraging the weight pruning method commonly used in model compression. Specifically, we obtain multiple diverse models by a random weight pruning method. These models preserve similar accuracies and can serve as additional models for ensemble-based methods, yielding stronger transferable targeted attacks. Experiments on ImageNet-Compatible Dataset under the more challenging scenarios are provided: transferring to distinct architectures and to adversarially trained models. The results show that our proposed DWP improves the targeted attack success rates with up to 4.1% and 8.0% on the combination of state-of-the-art methods, respectively
翻译:恶意攻击者可以通过在图像上强加人无法察觉的噪音,迫使神经网络模型产生具体不正确的产出,从而产生有针对性的对抗性实例。通过跨模范可转移的对抗性实例,神经网络的脆弱性即使模型信息对攻击者保密,也依然存在。最近的研究显示,基于共性的方法在产生可转移的对抗性实例方面的有效性。然而,在更具有挑战性的情景下,在不同的模型中制造目标攻击的更具挑战性的情景下,现有方法不尽如人意。在这项工作中,我们提议通过利用模型压缩中常用的重力调整方法,进一步强化基于共性的方法。具体地说,我们通过随机加权裁剪裁方法获得多种不同的模型。这些模型保留了类似的精度,可以作为基于共性方法的额外模型,产生更强大的可转移的有针对性的攻击。在更具挑战性的设想下,对图像网络可比较的数据集进行了实验:向不同的架构和经过对抗性培训的模型转移。结果显示,我们提议的DWP改进了目标攻击成功率,分别达到4.1%和8.0%。