Backdoor attack has emerged as a novel and concerning threat to AI security. These attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it exhibits abnormal behavior on samples containing the trigger pattern. However, most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection, and their injection process results in the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model. The attack process begins by using spatial attention to extract meaningful data features and generate trigger patterns associated with clean images. Then, an U-shaped model is used to embed these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNN across three standard datasets. The results demonstrate that SATBA achieves high attack success rate while maintaining robustness against backdoor defenses. Furthermore, we conduct extensive image similarity experiments to emphasize the stealthiness of our attack strategy. Overall, SATBA presents a promising approach to backdoor attack, addressing the shortcomings of previous methods and showcasing its effectiveness in evading detection and maintaining high attack success rate.
翻译:暂无翻译