Neural network implementations are known to be vulnerable to physical attack vectors such as fault injection attacks. As of now, these attacks were only utilized during the inference phase with the intention to cause a misclassification. In this work, we explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting. In particular, we discuss attacks against ReLU activation functions that make it possible to generate a family of malicious inputs, which are called fooling inputs, to be used at inference time to induce controlled misclassifications. Such malicious inputs are obtained by mathematically solving a system of linear equations that would cause a particular behaviour on the attacked activation functions, similar to the one induced in training through faulting. We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs. We evaluate our approach against multi-layer perceptron networks and convolutional networks on a popular image classification task obtaining high attack success rates (from 60% to 100%) and high classification confidence when as little as 25 neurons are attacked while preserving high accuracy on the originally intended classification task.
翻译:众所周知,神经网络的实施很容易受到人身攻击矢量的伤害,例如注射过失攻击。从现在起,这些攻击只在推断阶段使用,目的是造成错误的分类。在这项工作中,我们探索了神经网络培训阶段通过注射断层在神经网络培训阶段通过注射断层进行的新攻击范式,其方式是,由此形成的网络在部署期间可能受到攻击,而不必进一步错误。特别是,我们讨论了对雷路U启动功能的攻击,这种攻击使得有可能产生一系列恶意投入,即所谓的愚弄输入,在推断时间被用来诱导受控的分类错误。这些恶意输入是通过数学解决线性等式系统获得的,这种系统将在攻击启动功能上造成特定行为,类似于通过错误训练引发的系统。我们把这种欺骗后门攻击称为在培训阶段把后门攻击者引入网络,从而导致攻击者产生欺骗性投入。我们评价了我们如何对付多层次的透视网络和革命网络的方法,即通过大众图像分类来获得高攻击成功率(从最初的60%到预期的25 %),而高度机密分类则很少,而高度机密分类。