Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input are able to mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, investigation on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks to seek to answer the question: can we detect backdoor using adversarial detection methods. Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate that these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This is able to enhance our understanding about the inherent features of these two attacks and the defense opportunities.
翻译:众所周知,深神经网络很容易受到对抗性攻击和后门攻击,对输入的微小修改能够误导模型,从而得出错误的结果。虽然已经对对抗性攻击的防御进行了广泛的研究,但关于减轻后门攻击的调查仍处于早期阶段。对这两种攻击的防御之间是否存在联系和共同特征尚不得而知。我们对敌性例子和深神经网络的后门例子之间的联系进行了全面研究,以寻求解答问题:我们能否利用对抗性探测方法探测后门的例子和后门例子。我们的洞见依据的观察,即对抗性例子和后门例子在推断过程中都有异常之处,这与良性样品非常不同。因此,我们修订了现有的四种对抗性防御方法,以探测后门攻击的例子。广泛的评估表明,这些办法提供了可靠的保护,比探测对抗性例子更为精确。这些解决办法还揭示了对抗性例子、后门例子和典型敏感度、激活空间和特征空间中的正常样品之间的关系。这能够增进我们对这两种攻击的内在特征和防御机会的了解。