Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. % crafted in a similar fashion as CNNs. In this paper, we posit that adversarial attacks on transformers should be specially tailored for their architecture, jointly considering both patches and self-attention, in order to achieve high transferability. More specifically, we introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs. We show that skipping the gradients of attention during backpropagation can generate adversarial examples with high transferability. In addition, adversarial perturbations generated by optimizing randomly sampled subsets of patches at each iteration achieve higher attack success rates than attacks using all patches. We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs. The results of these experiments demonstrate that the proposed dual attack can greatly boost transferability between ViTs and from ViTs to CNNs. In addition, the proposed method can easily be combined with existing transfer methods to boost performance. Code is available at https://github.com/zhipeng-wei/PNA-PatchOut.
翻译:视觉变压器(Viet 变压器)在一系列计算机视觉任务中表现出了令人印象深刻的成绩,但是它们仍然遭受了对抗性的例子。 % 以类似CNN的方式制作。 在本文中,我们假设对变压器的对抗性攻击应当专门针对其结构设计,共同考虑补丁和自我注意,以实现高可转移性。更具体地说,我们引入了双重攻击框架,其中包括“不注意工资”攻击和“PatchOut”攻击,以提高不同ViT的对抗性样品的可转移性。我们表明,在后方调整期间跳过关注的梯度可以产生高度可转移性的对抗性例子。此外,通过优化每次迭压随机抽样的补压器组合产生的对抗性攻击性攻击成功率高于使用所有补丁的攻击率。我们评估了对State-the-art ViTs、CNN和经过严格训练的CNN的CNNs的攻击的可转移性。这些试验的结果表明,拟议的双重攻击可以大大促进VTs和ViTs/piNS的可转移性攻击性能/PNCODODLA/SODUDRA 的可轻易使用的方法。