Deep neural networks (DNNs) are now the de facto choice for computer vision tasks such as image classification. However, their complexity and "black box" nature often renders the systems they're deployed in vulnerable to a range of security threats. Successfully identifying such threats, especially in safety-critical real-world applications is thus of utmost importance, but still very much an open problem. We present TESDA, a low-overhead, flexible, and statistically grounded method for {online detection} of attacks by exploiting the discrepancies they cause in the distributions of intermediate layer features of DNNs. Unlike most prior work, we require neither dedicated hardware to run in real-time, nor the presence of a Trojan trigger to detect discrepancies in behavior. We empirically establish our method's usefulness and practicality across multiple architectures, datasets and diverse attacks, consistently achieving detection coverages of above 95% with operation count overheads as low as 1-2%.
翻译:深神经网络(DNNS)是计算机视觉任务(如图像分类)的事实上的选择。 但是,它们的复杂性和“黑盒子”的性质往往使得它们部署的系统易受一系列安全威胁的伤害。 成功识别这些威胁,特别是安全临界现实世界应用中的威胁,因此至关重要,但仍然是一个尚未解决的问题。 我们通过利用它们给 DNNS中间层特征分布造成的差异,对攻击进行实际选择,从而提供了一种低头、灵活和基于统计的“在线检测”方法。 与大多数以前的工作不同,我们既不需要专用硬件实时运行,也不需要特洛伊的触发器来检测行为上的差异。 我们从经验上确定我们的方法在多个结构、数据集和不同攻击中的实用性和实用性,持续达到95%以上的探测范围,操作将间接费用计为1-2 % 。