With the broad application of deep neural networks (DNNs), backdoor attacks have gradually attracted attention. Backdoor attacks are insidious, and poisoned models perform well on benign samples and are only triggered when given specific inputs, which cause the neural network to produce incorrect outputs. The state-of-the-art backdoor attack work is implemented by data poisoning, i.e., the attacker injects poisoned samples into the dataset, and the models trained with that dataset are infected with the backdoor. However, most of the triggers used in the current study are fixed patterns patched on a small fraction of an image and are often clearly mislabeled, which is easily detected by humans or defense methods such as Neural Cleanse and SentiNet. Also, it's difficult to be learned by DNNs without mislabeling, as they may ignore small patterns. In this paper, we propose a generalized backdoor attack method based on the frequency domain, which can implement backdoor implantation without mislabeling and accessing the training process. It is invisible to human beings and able to evade the commonly used defense methods. We evaluate our approach in the no-label and clean-label cases on three datasets (CIFAR-10, STL-10, and GTSRB) with two popular scenarios (self-supervised learning and supervised learning). The results show our approach can achieve a high attack success rate (above 90%) on all the tasks without significant performance degradation on main tasks. Also, we evaluate the bypass performance of our approach for different kinds of defenses, including the detection of training data (i.e., Activation Clustering), the preprocessing of inputs (i.e., Filtering), the detection of inputs (i.e., SentiNet), and the detection of models (i.e., Neural Cleanse). The experimental results demonstrate that our approach shows excellent robustness to such defenses.
翻译:随着深层神经网络的广泛应用,后门攻击逐渐引起人们的关注。 后门攻击是阴险的, 下毒模型在良性样本上表现良好, 并且只有在特定投入导致神经网络产生不正确的输出时才会触发。 最先进的后门攻击工作是通过数据中毒来实施的, 即攻击者将中毒的样本注入数据集, 并用该数据集训练过的模型被后门感染。 然而, 目前研究中使用的大多数触发器都固定在一小部分图像上, 并且往往被明显错误标记, 而这很容易被人类或防御方法( 如 Neural Cleanse 和 SentiNet ) 所检测到的。 另外, 最先进的后门攻击工作是数据中毒, 因为可能忽略小模式。 在本文中, 我们建议基于频率域的通用后门攻击方法, 可以在不误贴标签的情况下实施后门植入, 培训过程。 它对于人类是看不见的, 并且能够避开通用的防御方法。 我们通过人类或防御方法, 很容易地检测人类或防御方法, 很容易被检测到 人类的方法, 人类的方法,, 。 我们用所有的方法的方法, 包括 10- L 10- bro ( 我们的升级和 CL) 的高级 学习的高级 的高级 学习的高级的进度, 学习的进度, 我们的进度, 学习了一种方法, 10- bro- d) 的进度, 我们的进度, 学习了一种方法, 10- mal- d- dal- disal- disal- sal- sal