The ubiquity of deep neural networks (DNNs), cloud-based training, and transfer learning is giving rise to a new cybersecurity frontier in which unsecure DNNs have `structural malware' (i.e., compromised weights and activation pathways). In particular, DNNs can be designed to have backdoors that allow an adversary to easily and reliably fool an image classifier by adding a pattern of pixels called a trigger. It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data). Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, `fingerprints' its nonlinearity, and allows us to detect backdoors (if present). Our approach involves studying how a DNN responds to noise-infused images with varying noise intensity, which we summarize with titration curves. We find that DNNs with backdoors are more sensitive to input noise and respond in a characteristic way that reveals the backdoor and where it leads (its `target'). Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus hours).
翻译:深神经网络(DNN)的普遍存在、基于云的培训和传输学习正在产生一个新的网络安全前沿,其中没有安全保障的DNN拥有“结构性恶意软件”(即失密的重量和激活路径),特别是DNN可以设计后门,使对手能够轻易和可靠地愚弄图像分类者,增加一个被称为触发器的像素模式。一般很难探测后门,而现有的探测方法在计算上成本很高,需要大量资源(例如,获得培训数据)。在这里,我们提出一种快速地貌生成技术,以量化DNNN的坚固性,“指印”非线性,并使我们能够探测后门(如果存在的话)。我们的方法是研究DNN如何用不同噪音强度的图像作出反应,我们用调音曲线加以总结。我们发现,有后门的DNNN对输入噪音更加敏感,需要大量资源(例如,获得培训数据的机会)。在这里,我们建议一种快速的特性生成技术,以量化DNNNN的强度,即其“指向后门和向后方的高度方向的特征,我们能够准确地检测到我们现有“目标”的经验结果)。