Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wrong classification. In this paper we propose a novel, lightweight adversarial correction and/or detection mechanism for image classifiers that relies on undervolting (running a chip at a voltage that is slightly below its safe margin). We propose using controlled undervolting of the chip running the inference process in order to introduce a limited number of compute errors. We show that these errors disrupt the adversarial input in a way that can be used either to correct the classification or detect the input as adversarial. We evaluate the proposed solution in an FPGA design and through software simulation. We evaluate 10 attacks on two popular DNNs and show an average detection rate of 80% to 95%.
翻译:深神经网络(DNN) 分类器是驱动从图像识别到自主车辆等广泛重要应用的强大工具。 不幸的是, DNN 已知很容易受到对抗性攻击,这些攻击几乎影响到所有最先进的模型。 这些攻击对投入进行了小的无法察觉的修改,足以诱使 DNN 产生错误的分类。 在本文中,我们提议对依赖低演的图像分类器采用新的、轻量的对抗性校正和/或检测机制(在电压下运行一个芯片,略低于其安全边缘 ) 。 我们提议使用受控的芯片低演,运行推论过程,以引入数量有限的折算错误。 我们表明这些错误扰乱了对抗性输入, 其方式可以用来纠正分类或检测为对抗性输入。 我们用FGA设计和软件模拟来评估拟议解决方案。 我们评估了对两个流行的DNN的10次攻击案, 并显示平均检测率为80%至95%。