Adversarial patch attacks that inject arbitrary distortions within a bounded region of an image, can trigger misclassification in deep neural networks (DNNs). These attacks are robust (i.e., physically realizable) and universally malicious, and hence represent a severe security threat to real-world DNN-based systems. This work proposes Jujutsu, a two-stage technique to detect and mitigate robust and universal adversarial patch attacks. We first observe that patch attacks often yield large influence on the prediction output in order to dominate the prediction on any input, and Jujutsu is built to expose this behavior for effective attack detection. For mitigation, we observe that patch attacks corrupt only a localized region while the remaining contents are unperturbed, based on which Jujutsu leverages GAN-based image inpainting to synthesize the semantic contents in the pixels that are corrupted by the attacks, and reconstruct the ``clean'' image for correct prediction. We evaluate Jujutsu on four diverse datasets and show that it achieves superior performance and significantly outperforms four leading defenses. Jujutsu can further defend against physical-world attacks, attacks that target diverse classes, and adaptive attacks. Our code is available at https://github.com/DependableSystemsLab/Jujutsu.
翻译:在图像的封闭区域中,Aversarial Adversarial 补丁攻击会给图像的封闭区域造成任意扭曲,从而引发深层神经网络(DNNSs)的错误分类。这些攻击是强势(即物理上可实现的)和普遍的恶意的,因此对基于现实世界的DNN系统构成严重的安全威胁。这项工作提议了Jujutssu,这是探测和减轻强力和普遍的对抗性补丁攻击的两阶段技术。我们首先观察到补丁攻击往往对预测输出产生很大影响,以便控制任何输入的预测,而Jujutssu是用来揭露这一行为,以便有效地探测攻击。为了减轻这些攻击,我们发现补丁攻击只是一个局部区域腐败,而其余的内容却没有受到干扰。Jujutsu利用以GAN为基础的图像来合成受攻击腐蚀的像素中的语义内容,并重建“干净的图像,以便正确预测。我们在四个不同的数据集上评价Jujutsu,显示它达到更高的性表现,明显超越了四个防御目标。Jutsumefroductions。Jubuts/s