Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks.
翻译:视觉变异器(ViTs)有着完全不同的架构,与进化神经网络相比,其感知偏差要小得多。随着ViTs性能的改善,安全和稳健性也非常重要。与最近利用ViTs强力对抗对抗对抗性对立实例的许多工作相比,本文件调查了具有代表性的诱因攻击,即后门攻击。我们首先检查ViTs对各种后门攻击的脆弱性,发现ViTs也很容易受到现有攻击。然而,我们发现,ViTs的清洁数据准确性和后门攻击成功率与定位编码之前的补差变化有明显的对应作用。然后,我们根据这一发现,我们为ViTs提出了一种有效的方法,用以通过补丁处理来保护补丁基和混合触发后门攻击。我们用了一些基准数据集,包括CIFAR10、GTSRB和TyImageNet, 显示拟议的新防御在减少ViTs后门攻击方面非常成功。为了最佳的防御性战略。为了对付ViTs的反向后门攻击,我们最独特的战略。