Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks. The paper will appear in the Proceedings of the AAAI'23 Conference. This work was initially submitted in November 2021 to CVPR'22, then it was re-submitted to ECCV'22. The paper was made public in June 2022. The authors sincerely thank all the referees from the Program Committees of CVPR'22, ECCV'22, and AAAI'23.
翻译:视觉变异器(ViTs)有着完全不同的结构,其感知偏差远比进化神经网络要小得多。随着ViTs的性能改善,安全和稳健性也非常重要。与最近利用ViTs的强力对抗对抗对抗性辩论的例子的许多工作相比,本文调查了具有代表性的诱因攻击,即后门。我们首先检查ViTs对各种后门攻击的脆弱性,发现ViTs也很容易受到现有攻击。然而,我们发现ViTs的清洁数据准确性和后门攻击成功率明显地反应了定位编码之前的补差变。随后,我们根据这一发现,我们为ViTs 提供了一种有效的方法来保护补丁基和混合的后门攻击。我们首先在几个基准数据集(包括CIFAR10、GTSRB和TinyImageNet)上评价了表现的性能。我们提出的新防御性防御性防御性防御性防御性防御性程序在VTs 20年6月提交ERC的论文中首次使用。