Backdoor attacks have been shown to be a serious security threat against deep learning models, and detecting whether a given model has been backdoored becomes a crucial task. Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons. However, the above observations are violated in many cases especially for advanced backdoor attacks, hindering the performance and applicability of the existing defenses. In this paper, we propose a backdoor defense DTInspector built upon a new observation. That is, an effective backdoor attack usually requires high prediction confidence on the poisoned training samples, so as to ensure that the trained model exhibits the targeted behavior with a high probability. Based on this observation, DTInspector first learns a patch that could change the predictions of most high-confidence data, and then decides the existence of backdoor by checking the ratio of prediction changes after applying the learned patch on the low-confidence data. Extensive evaluations on five backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.
翻译:事实证明,后门攻击是对深层学习模式的严重安全威胁,发现某一模式是否被后门攻击是一项关键任务。现有的防御主要基于以下观察:后门触发器通常规模较小,或只影响少数神经元的激活。然而,上述观察在许多情况下被违反,特别是针对先进的后门攻击,妨碍了现有防御系统的性能和适用性。在本文中,我们提议在新的观察基础上建立一个后门防御DTINSPerctor。这就是说,有效的后门攻击通常需要有毒训练样品的高度预测信心,以确保受过训练的模型能以很高的概率显示目标行为。基于这一观察,DTINSpector首先学会了可以改变大多数高度自信数据的预测的补丁,然后通过在应用所学的对低信任数据的补丁之后,通过检查预测变化的比率来确定后门的存在。对五次后门攻击、四套数据集和三种先进的攻击类型进行广泛的评价,以证明拟议防御的有效性。