Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wrong classification. In this paper we propose a novel, lightweight adversarial correction and/or detection mechanism for image classifiers that relies on undervolting (running a chip at a voltage that is slightly below its safe margin). We propose using controlled undervolting of the chip running the inference process in order to introduce a limited number of compute errors. We show that these errors disrupt the adversarial input in a way that can be used either to correct the classification or detect the input as adversarial. We evaluate the proposed solution in an FPGA design and through software simulation. We evaluate 10 attacks and show average detection rates of 77% and 90% on two popular DNNs.
翻译:深神经网络(DNN) 分类器是驱动从图像识别到自主车辆等广泛重要应用的强大工具。 不幸的是, DNN 已知很容易受到对抗性攻击,这些攻击几乎影响到所有最先进的模型。 这些攻击对投入进行了很小的无法察觉的修改,足以诱使 DNN 产生错误分类。 在本文中,我们提议对依赖低演的图像分类器采用新的、轻量的对抗性校正和/或检测机制(在电压上设置一个芯片,略低于其安全边缘 ) 。 我们提议使用受控的芯片在进行推断过程中的演化过低,以引入数量有限的伪造错误。 我们表明,这些错误破坏了对抗性输入,可以用来纠正分类,或者将输入检测为对抗性输入。 我们在FGA设计和软件模拟中评估拟议解决方案。 我们评估了10次攻击,并显示两个流行的DNPN, 平均检测率为77%和90%。