A recent line of work has shown that deep networks are highly susceptible to backdoor data poisoning attacks. Specifically, by injecting a small amount of malicious data into the training distribution, an adversary gains the ability to control the model's behavior during inference. In this work, we propose an iterative training procedure for removing poisoned data from the training set. Our approach consists of two steps. We first train an ensemble of weak learners to automatically discover distinct subpopulations in the training set. We then leverage a boosting framework to recover the clean data. Empirically, our method successfully defends against several state-of-the-art backdoor attacks, including both clean and dirty label attacks. We also present results from an independent third-party evaluation including a recent \textit{adaptive} poisoning adversary. The results indicate our approach is competitive with existing defenses against backdoor attacks on deep neural networks, and significantly outperforms the state-of-the-art in several scenarios.
翻译:最近的一项工作表明,深层网络极易受到后门数据中毒袭击。 具体地说, 将少量恶意数据输入培训分布, 对手获得了在推断过程中控制模型行为的能力。 在这项工作中, 我们提议了一个反复的培训程序, 将有毒数据从培训组中去除。 我们的方法包括两个步骤。 我们首先训练一组弱小的学习者, 以便自动发现培训组中不同的亚群。 然后我们利用一个提升框架来恢复清洁数据。 生动地说, 我们的方法成功地抵御了数起最先进的后门袭击, 包括清洁和肮脏的标签袭击。 我们还介绍了一项独立的第三方评估的结果, 包括最近的一项\ textit{适应} 毒死敌。 结果表明,我们的方法与现有的防御系统相比是竞争性的, 以对抗深神经网络的后门攻击, 并在几种情况下大大超越了最新技术。