Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.
翻译:视觉变压器(ViT)在各种与视觉相关的任务中表现出最先进的性能。ViTs的成功激发了对ViTs进行后门攻击的攻击者。虽然传统CNN对后门攻击的脆弱性是众所周知的,但是ViTs的后门攻击研究很少。与通过卷积捕获像素级局部特征的CNN相比,ViTs通过补丁和注意力提取全局上下文信息。将CNN特定的后门攻击天真地移植到ViTs只能获得低干净数据准确性和低攻击成功率。在本文中,我们提出了一种隐蔽且实用的ViT特定后门攻击$TrojViT$。TrojViT不是通过CNN特定的区域触发器,而是生成一个补丁级触发器,旨在通过补丁显着度排名和注意目标损失构建由DRAM存储的ViT参数上的一些易受攻击比特组成的特洛伊木马。TrojViT进一步使用最小调节的参数更新来减少特洛伊木马的比特数。一旦攻击者通过翻转易受攻击的位将木马插入到ViT模型中,ViT模型仍然会以良性输入产生正常的推理准确性。但是,当攻击者将触发器嵌入输入时,ViT模型被迫将输入分类为预定义的目标类。我们表明,在ImageNet的ViT上翻转仅几个TrojViT识别的易感位即可使用Well-Known RowHammer将模型转换为带后门的模型。我们在各种ViT模型的多个数据集上进行了广泛的实验。TrojViT可以通过翻转ViT上的345个位来对测试图像分类到目标类中的99.64%。