Along with the popularity of Artificial Intelligence (AI) techniques, an increasing number of backdoor injection attacks are designed to maliciously threaten Deep Neural Networks (DNNs) deployed in safety-critical systems. Although there exist various defense methods that can effectively erase backdoor triggers from DNNs, they still greatly suffer from a non-negligible Attack Success Rate (ASR) as well as a major loss in benign accuracy. Inspired by the observation that a backdoored DNN will form new clusters in its feature space for poisoned data, in this paper we propose a novel backdoor defense method named MCL based on model-contrastive learning. Specifically, model-contrastive learning to implement backdoor defense consists of two steps. First, we use the backdoor attack trigger synthesis technique to invert the trigger. Next, the inversion trigger is used to construct poisoned data, so that model-contrastive learning can be used, which makes the feature representations of poisoned data close to that of the benign data while staying away from the original poisoned feature representations. Through extensive experiments against five start-of-the-art attack methods on multiple benchmark datasets, using only 5% of clean data, MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data. MCL can make the benign accuracy degenerate by less than 1%.
翻译:随着人工智能(人工智能)技术的普及,越来越多的后门注射袭击被设计成恶意威胁在安全临界系统中部署的深神经网络(DNN)的恶意威胁。尽管存在各种防御方法可以有效消除DNN的后门触发器,但它们仍然深受非忽略攻击成功率(ASR)以及良性精确度的重大损失的影响。人们观察到,后门DNN将在其有毒数据特征空间形成新的集群,因此,在本文件中,我们提议以模式-调试学习为基础,采用名为MCL的新型后门防御方法。具体地说,实施后门防御的模型-调试学习包括两个步骤。首先,我们使用后门攻击触发合成技术来击退触发触发器。接下来,使用后门触发技术来构建有毒数据,从而可以使用模型-调试学习方法,使中毒数据的特点接近于良性数据的特征,同时远离最初的毒害特征表现。通过对五种启动的开始式攻击方法进行广泛的实验,降低后门防御。在多个基准数据中,只能使用更精确性MBR的精确度上的数据,而只能使用更精确度只使用清洁的数据。