The increasing importance of both deep neural networks (DNNs) and cloud services for training them means that bad actors have more incentive and opportunity to insert backdoors to alter the behavior of trained models. In this paper, we introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights using independent vector analysis (IVA) followed by a machine learning classifier. In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data, being applicable across domains, operating with a wide range of network architectures, not assuming the nature of the triggers used to change network behavior, and being highly scalable. We discuss the detection pipeline, and then demonstrate the results on two computer vision datasets regarding image classification and object detection. Our method outperforms the competing algorithms in terms of efficiency and is more accurate, helping to ensure the safe application of deep learning and AI.
翻译:深神经网络(DNN)和云层服务对于培训它们的重要性日益增加,这意味着坏人有更多动力和机会插入后门以改变受过训练的模型的行为。 在本文中,我们引入了一种新的后门检测方法,利用独立矢量分析(IVA)从经过训练的DNN重量中提取特征,然后由机器学习分类师进行。与其他检测技术相比,这具有许多好处,例如不需要任何培训数据,这种数据可以跨领域适用,使用广泛的网络结构,不假定用于改变网络行为的触发器的性质,并且高度可扩展。我们讨论探测管道,然后展示两个计算机视觉数据集在图像分类和天体检测方面的结果。我们的方法在效率方面超过了相互竞争的算法,而且更加准确,有助于确保深层次学习和人工智能的安全应用。