State-of-the-art deep neural networks (DNNs) have been proven to be vulnerable to adversarial manipulation and backdoor attacks. Backdoored models deviate from expected behavior on inputs with predefined triggers while retaining performance on clean data. Recent works focus on software simulation of backdoor injection during the inference phase by modifying network weights, which we find often unrealistic in practice due to restrictions in hardware. In contrast, in this work for the first time, we present an end-to-end backdoor injection attack realized on actual hardware on a classifier model using Rowhammer as the fault injection method. To this end, we first investigate the viability of backdoor injection attacks in real-life deployments of DNNs on hardware and address such practical issues in hardware implementation from a novel optimization perspective. We are motivated by the fact that vulnerable memory locations are very rare, device-specific, and sparsely distributed. Consequently, we propose a novel network training algorithm based on constrained optimization to achieve a realistic backdoor injection attack in hardware. By modifying parameters uniformly across the convolutional and fully-connected layers as well as optimizing the trigger pattern together, we achieve state-of-the-art attack performance with fewer bit flips. For instance, our method on a hardware-deployed ResNet-20 model trained on CIFAR-10 achieves over 89% test accuracy and 92% attack success rate by flipping only 10 out of 2.2 million bits.
翻译:最新技术的深度神经网络(DNN)已经被证明对对抗性操纵和后门攻击是脆弱的。当输入触发预定义的触发器时,后门模型会偏离预期行为,同时在干净数据上保持性能。最近的研究集中在模拟后门注入,通过修改网络权重来影响推理阶段,在实践中我们发现这种方法往往并不真实。相反,在本文中,我们首次使用Rowhammer作为故障注入方法,在实际硬件上实现了端到端的后门注入攻击。为此,我们首先调查了在硬件上DNN部署中后门注入攻击的可行性,并从一种新颖的优化角度解决了硬件实现中的实际问题。我们的动机是脆弱的内存位置非常罕见,设备特定且分布稀疏。因此,我们提出了一种基于约束优化的网络训练算法,以在硬件中实现实际的后门注入攻击。通过在卷积层和全连接层之间统一修改参数以及优化触发器模式,我们使用更少的位翻转实现了最先进的攻击性能。例如,我们的方法在部署在硬件上的CIFAR-10 ResNet-20模型上仅翻转了220万个位中的10个,就实现了超过89%的测试准确率和92%的攻击成功率。