Backdoor attacks pose a new and emerging threat to AI security, where Deep Neural Networks (DNNs) are trained on datasets added to hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it produces anomalous results on samples containing the trigger pattern. Nevertheless, most existing backdoor attacks face two significant drawbacks: their trigger patterns are visible and easy to detect by human inspection, and their injection process leads to the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and the model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations by using spatial attention mechanism and U-type model. Our attack leverages spatial attention mechanism to extract data features and generate invisible trigger patterns that are correlated with clean data. Then it uses U-type model to plant these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNNs across three standard datasets and demonstrate that it achieves high attack success rate and robustness against backdoor defenses. Additionally, we also conduct extensive experiments on image similarity to highlight the stealthiness of our attack.
翻译:后门攻击是人工智能安全领域的一种新兴威胁,在这种攻击中,深度神经网络会在数据集中添加隐藏的触发模式进行训练。虽然被毒害的模型在正常样本上表现正常,但它会在包含触发模式的样本上产生异常结果。然而,大多数组后门攻击存在两个重要缺点:它们的触发模式是可见且易于被人工检测的,并且它们的注入过程会导致自然样本特征和触发模式的丢失,从而降低攻击成功率和模型准确率。本文提出了一种名为 SATBA 的新型后门攻击,它通过使用空间注意机制和 U 型模型来克服这些限制。我们的攻击利用空间注意机制提取数据特征并生成与干净数据相关的隐形触发模式,然后使用 U 型模型将这些触发模式注入到原始数据中,而不会引起显著的特征丢失。我们在三个标准数据集上分别对三个著名的图像分类 DNN 进行了攻击评估,并证明它实现了高攻击成功率和抵抗后门防御的稳健性。此外,我们还进行了大量的图像相似性实验,突出了我们攻击的隐秘性。