Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is 'backdoored' based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach integrates statistical sampling as well as abstract interpretation. The experiment results show that our approach effectively verifies the absence of backdoor or generates backdoor triggers.
翻译:神经网络在解决许多问题(包括安全/安保关键系统中的许多应用)方面达到了最先进的性能。研究人员还发现了与神经网络有关的多种安全问题。其中之一是后门攻击,即神经网络可能嵌入后门,从而几乎总是在触发器面前产生目标输出。现有的防御方法主要侧重于检测神经网络是否基于超自然学(例如激活模式)而“后门”。根据我们的知识,证明后门不存在的唯一工作线是以随机滑动为基础的,这众所周知会显著降低神经网络的性能。在这项工作中,我们提出了一种方法,以核实给定的神经网络是否没有后门,而且一定的成功率。我们的方法将统计抽样和抽象解释结合起来。实验结果表明,我们的方法有效地验证了后门的缺失或产生后门触发因素。