Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust backdoor defense strategy during the training stage is of importance. We argue that the core of training-time defense is to select poisoned samples and to handle them properly. In this work, we summarize the training-time defenses from a unified framework as splitting the poisoned dataset into two data pools. Under our framework, we propose an adaptively splitting dataset-based defense (ASD). Concretely, we apply loss-guided split and meta-learning-inspired split to dynamically update two data pools. With the split clean data pool and polluted data pool, ASD successfully defends against backdoor attacks during training. Extensive experiments on multiple benchmark datasets and DNN models against six state-of-the-art backdoor attacks demonstrate the superiority of our ASD. Our code is available at https://github.com/KuofengGao/ASD.
翻译:论文名称:基于自适应分割污染数据集的后门防御
摘要:后门防御已经被研究用于缓解深度神经网络(DNN)被后门攻击和恶意改变的威胁。由于DNN通常采用一些不受信任的第三方训练数据,因此在训练阶段进行强大的后门防御策略非常重要。我们认为,训练时间防御的核心是选择污染的样本并正确处理它们。在本文中,我们将训练时间的防御措施从一个统一的框架下总结,即将污染数据集分成两个数据池。在我们的框架下,我们提出了一种基于自适应分割数据集的防御(ASD)。具体而言,我们将基于损失的分割和基于元学习启发式分割应用于动态更新两个数据池。通过分离干净数据池和污染数据池,ASD在训练期间成功防御了后门攻击。多个基准数据集和DNN模型针对六种最先进的后门攻击的大量实验表明了我们ASD的优越性。我们的代码可在 https://github.com/KuofengGao/ASD 找到。