Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN). A DNN being attacked will predict to an attacker-desired target class whenever a test sample from any source class is embedded with a backdoor pattern; while correctly classifying clean (attack-free) test samples. Existing backdoor defenses have shown success in detecting whether a DNN is attacked and in reverse-engineering the backdoor pattern in a "post-training" regime: the defender has access to the DNN to be inspected and a small, clean dataset collected independently, but has no access to the (possibly poisoned) training set of the DNN. However, these defenses neither catch culprits in the act of triggering the backdoor mapping, nor mitigate the backdoor attack at test-time. In this paper, we propose an "in-flight" defense against backdoor attacks on image classification that 1) detects use of a backdoor trigger at test-time; and 2) infers the class of origin (source class) for a detected trigger example. The effectiveness of our defense is demonstrated experimentally against different strong backdoor attacks.
翻译:后门(Trojan)攻击正在形成对深神经网络的威胁。 受到攻击的DNN将预测,当任何源类的试样嵌入后门模式时,攻击者希望的目标类别将达到攻击者所期望的目标类别; 正确分类清洁( 无攻击) 测试样品; 现有的后门防御在“ 后培训” 制度下,在发现DNN是否受到攻击和反向设计后门模式方面已取得成功: 捍卫者可以进入DNN接受检查, 独立收集的小型、 干净的数据集, 但没有机会获得DNN的( 可能中毒的) 训练。 然而, 这些防御既不在触发后门绘图的行为中捕获罪犯, 也没有在测试时减少后门攻击。 在本文中,我们提议对图像分类的后门攻击进行“ 飞行” 防御, 以便1 检测测试时使用后门触发器; 2) 推断出检测到的触发器的源类( 源类) 。 我们的防御效果是实验性地证明, 。