Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. A victim classifier will predict to an attacker-desired target class whenever a test sample is embedded with the same backdoor pattern (BP) that was used to poison the classifier's training set. Detecting whether a classifier is backdoor attacked is not easy in practice, especially when the defender is, e.g., a downstream user without access to the classifier's training set. This challenge is addressed here by a reverse-engineering defense (RED), which has been shown to yield state-of-the-art performance in several domains. However, existing REDs are not applicable when there are only {\it two classes} or when {\it multiple attacks} are present. These scenarios are first studied in the current paper, under the practical constraints that the defender neither has access to the classifier's training set nor to supervision from clean reference classifiers trained for the same domain. We propose a detection framework based on BP reverse-engineering and a novel {\it expected transferability} (ET) statistic. We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used. The excellent performance of our method is demonstrated on six benchmark datasets. Notably, our detection framework is also applicable to multi-class scenarios with multiple attacks.
翻译:深神经网络分类器(BAs)正在对深神经网络分类器形成威胁。 受害人分类器将向攻击者渴望的目标类别预测,只要测试样品嵌入的是一种用于毒害分类员训练的相同的后门模式(BP),即用于毒害分类员训练的后门模式(BP ) 。 在实践中,检测分类器是否是后门攻击并非易事, 特别是在维护者既无法获得分类器训练的下游用户, 也得不到为同一领域培训的清洁分类器的监督的情况下。 我们在此提出一个反向工程防御(RED)来应对这一挑战, 它已经显示在若干领域产生最先进的性能。 然而,当只有 ~ ~两个类别 或存在 ~ 多次攻击 时, 现有的RED 不适用。 这些情况首先在本文中研究, 在实际限制下, 特别是保护者既无法获得分类器的训练, 也没有接受过为同一领域培训的清洁的分类器师培训。 我们提议了一个基于BP反向- 工程和新预期的可转移性(ET) (ET) 统计, 我们展示了可应用的快速操作的系统测试的系统测试模型的系统测试, 我们的系统测试, 测试是使用BVI- 矩阵的多重攻击的快速分析方法。