The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Existing DNN testing methods are mainly designed to find incorrect corner case behaviors in adversarial settings but fail to discover the backdoors crafted by strong trojan attacks. Observing the trojan network behaviors shows that they are not just reflected by a single compromised neuron as proposed by previous work but attributed to the critical neural paths in the activation intensity and frequency of multiple neurons. This work formulates the DNN backdoor testing and proposes the CatchBackdoor framework. Via differential fuzzing of critical neurons from a small number of benign examples, we identify the trojan paths and particularly the critical ones, and generate backdoor testing examples by simulating the critical neurons in the identified paths. Extensive experiments demonstrate the superiority of CatchBackdoor, with higher detection performance than existing methods. CatchBackdoor works better on detecting backdoors by stealthy blending and adaptive attacks, which existing methods fail to detect. Moreover, our experiments show that CatchBackdoor may reveal the potential backdoors of models in Model Zoo.
翻译:在现实世界应用中,深神经网络的成功得益于大量预先培训的模型。然而,后门预先培训的模型可能对下游DNN的部署构成巨大的天体威胁。现有的DNN测试方法主要是为了在对立环境中发现不正确的角落案例行为,但未能发现由强烈的Trojan攻击所形成的后门。观察Trojan网络行为表明,它们不仅被先前工作所提议的单一受损神经系统所反映,而且归因于多个神经神经元激活强度和频率中的关键神经路径。这项工作设计DNN后门测试并提出CacBackdoor框架。从少数良性例子中发现关键神经元的差别模糊,我们确定Trojan路径,特别是关键路径,并通过模拟所确定的路径中的关键神经元产生后门测试实例。广泛的实验表明CatchBackdoor的优势,比现有方法要高。CatchBackdoor在通过隐性混合和适应性袭击后门的后门测试方面做得更好。Chackdoor工作通过隐型的混合和适应性实验来探测后门的后门,此外,我们现有的方法也未能探测出捕捉到后门的模型。