Although deep neural networks (DNNs) have achieved a great success in various computer vision tasks, it is recently found that they are vulnerable to adversarial attacks. In this paper, we focus on the so-called \textit{backdoor attack}, which injects a backdoor trigger to a small portion of training data (also known as data poisoning) such that the trained DNN induces misclassification while facing examples with this trigger. To be specific, we carefully study the effect of both real and synthetic backdoor attacks on the internal response of vanilla and backdoored DNNs through the lens of Gard-CAM. Moreover, we show that the backdoor attack induces a significant bias in neuron activation in terms of the $\ell_\infty$ norm of an activation map compared to its $\ell_1$ and $\ell_2$ norm. Spurred by our results, we propose the \textit{$\ell_\infty$-based neuron pruning} to remove the backdoor from the backdoored DNN. Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
翻译:虽然深层神经网络(DNN)在各种计算机视觉任务中取得了巨大成功,但最近发现它们很容易受到对抗性攻击。在本文中,我们关注所谓的“textit{backdoor attack} ”,这为一小部分培训数据(也称为数据中毒)注入了后门触发器,使受过训练的DNN在面对触发器的例子时导致分类错误。具体地说,我们仔细研究真实和合成后门攻击对香草和后门DNNN的内部反应的影响,通过Gard-CAM的镜头。此外,我们表明后门攻击在激活神经能力方面产生了严重的偏差,其值为$\ell ⁇ /infty$/美元,而其标准值为$_1美元和$\ell_2美元。我们根据我们的结果,我们建议用\text{$@ell_infty$基础神经运行}来仔细研究真实和合成后门攻击对香草和后门神经运行的影响,以便从后门DNNN。实验表明,我们的方法可以有效地降低攻击成功率,并且保持高等级。