As a new realm of AI security, backdoor attack has drew growing attention research in recent years. It is well known that backdoor can be injected in a DNN model through the process of model training with poisoned dataset which is consist of poisoned sample. The injected model output correct prediction on benign samples yet behave abnormally on poisoned samples included trigger pattern. Most existing trigger of poisoned sample are visible and can be easily found by human visual inspection, and the trigger injection process will cause the feature loss of natural sample and trigger. To solve the above problems and inspire by spatial attention mechanism, we introduce a novel backdoor attack named SATBA, which is invisible and can minimize the loss of trigger to improve attack success rate and model accuracy. It extracts data features and generate trigger pattern related to clean data through spatial attention, poisons clean image by using a U-type models to plant a trigger into the original data. We demonstrate the effectiveness of our attack against three popular image classification DNNs on three standard datasets. Besides, we conduct extensive experiments about image similarity to show that our proposed attack can provide practical stealthiness which is critical to resist to backdoor defense.
翻译:作为AI安全的新领域,近些年来后门攻击引起了越来越多的关注。众所周知,后门攻击可以通过由有毒样品组成的有毒数据集的示范培训过程,将后门注入DNN模型中。对良性样品的注射模型输出正确性预测,但对中毒样品的异常行为包括触发模式。大多数有毒样品的现有触发器是可见的,并且很容易通过人类视觉检查发现,触发注射过程将造成自然样本和触发器的特征损失。为了解决上述问题,并受到空间关注机制的启发,我们引入了一种名为SATBA的新颖的后门攻击,这是隐形的,可以尽量减少触发器的损失,以提高攻击成功率和模型精确度。它提取数据特征,并通过使用U型模型将触发器毒害清洁图像进入原始数据,从而产生与清洁数据有关的触发模式。我们用三个标准数据集对三种流行图像分类DNNS的打击是有效的。此外,我们对图像相似性进行了广泛的实验,以显示我们拟议的攻击能够提供实际的隐形性,而这是抵抗后门防御的关键。</s>