Along with the success of deep neural network (DNN) models, rise the threats to the integrity of these models. A recent threat is the Trojan attack where an attacker interferes with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Since the knowledge of triggers is privy to the attacker, detection of Trojan networks is challenging. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process. For a comprehensive analysis, we develop Odysseus, the most diverse dataset to date with over 3,000 clean and Trojan models. Odysseus covers a large spectrum of attacks; generated by leveraging the versatility in trigger designs and source to target class mappings. Our analysis results show that Trojan attacks affect the classifier margin and shape of decision boundary around the manifold of clean data. Exploiting these two factors, we propose an efficient Trojan detector that operates without any knowledge of the attack and significantly outperforms existing methods. Through a comprehensive set of experiments we demonstrate the efficacy of the detector on cross model architectures, unseen Triggers and regularized models.
翻译:随着深心神经网络(DNN)模型的成功,这些模型的完整性将受到威胁。最近的一个威胁是Trojan攻击,攻击者通过将触发器插入一些培训样本并训练模型只对含有触发器的样本采取恶意行动,从而干扰培训管道。由于对触发器的了解对攻击者十分了解,探测Trojan网络具有挑战性。现有的Trojan探测器对触发器和攻击类型作出强烈的假设。我们根据对DNN内在特性的分析提出一个探测器;由于Trojaning进程而受到影响。为了进行全面分析,我们开发Odyssenus,这是迄今为止最多样化的数据集于3 000多个清洁和Trojan模型中的最有恶意的模型。Odyseus涵盖大量攻击;由于利用触发器设计和源的多功能来进行目标班级绘图,因此具有挑战性。我们的分析结果表明,Trojan攻击影响到分类器的距离和清洁数据方块周围决定边界的形状。我们提出了这两个因素,为了全面分析,我们建议建立一个高效的Troyan探测器模型,该模型在不掌握任何对攻击的常规测试方法进行,并且大大地展示了我们所测得的模型。