Malicious attackers can generate targeted adversarial examples by imposing tiny noises, forcing neural networks to produce specific incorrect outputs. With cross-model transferability, network models remain vulnerable even in black-box settings. Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples. To further enhance transferability, model augmentation methods aim to produce more networks participating in the ensemble. However, existing model augmentation methods are only proven effective in untargeted attacks. In this work, we propose Diversified Weight Pruning (DWP), a novel model augmentation technique for generating transferable targeted attacks. DWP leverages the weight pruning method commonly used in model compression. Compared with prior work, DWP protects necessary connections and ensures the diversity of the pruned models simultaneously, which we show are crucial for targeted transferability. Experiments on the ImageNet-compatible dataset under various and more challenging scenarios confirm the effectiveness: transferring to adversarially trained models, Non-CNN architectures, and Google Cloud Vision. The results show that our proposed DWP improves the targeted attack success rates with up to $10.1$%, $6.6$%, and $7.0$% on the combination of state-of-the-art methods, respectively. The source code will be made available after acceptance.
翻译:恶意攻击者通过施加微小的噪音,迫使神经网络产生具体不正确的产出,可以产生有针对性的对抗性例子。跨模范可转移性,网络模型即使在黑盒环境中仍然脆弱。最近的研究显示,以整体为基础的方法在产生可转移的对抗性例子方面的有效性。为了进一步加强可转移性,模式增强方法旨在产生更多的参与组合的网络。然而,现有的模式增强方法只有在非目标攻击中证明是有效的。在这项工作中,我们提议了多样化的轻度缓冲(DWP),这是产生可转移的定向攻击的新模式增强技术。DWP利用了在模型压缩中常用的重量调整方法。与以前的工作相比,DWP保护了必要的连接并确保了经调整的模型的多样性,而我们同时显示,这些对于有针对性地转移至关重要。在各种和更具挑战性的情景下对图像网络兼容数据集的实验证实了其有效性:转让经过对抗性培训的模式、非CNN结构以及谷歌云愿景。结果显示,我们提议的DWP将目标攻击成功率提高到10.1%,接受后将分别使用10.6%的组合方法。