Adversarial patch attacks against image classification deep neural networks (DNNs), which inject arbitrary distortions within a bounded region of an image, can generate adversarial perturbations that are robust (i.e., remain adversarial in physical world) and universal (i.e., remain adversarial on any input). Such attacks can lead to severe consequences in real-world DNN-based systems. This work proposes Jujutsu, a technique to detect and mitigate robust and universal adversarial patch attacks. For detection, Jujutsu exploits the attacks' universal property - Jujutsu first locates the region of the potential adversarial patch, and then strategically transfers it to a dedicated region in a new image to determine whether it is truly malicious. For attack mitigation, Jujutsu leverages the attacks' localized nature via image inpainting to synthesize the semantic contents in the pixels that are corrupted by the attacks, and reconstruct the ``clean'' image. We evaluate Jujutsu on four diverse datasets (ImageNet, ImageNette, CelebA and Place365), and show that Jujutsu achieves superior performance and significantly outperforms existing techniques. We find that Jujutsu can further defend against different variants of the basic attack, including 1) physical-world attack; 2) attacks that target diverse classes; 3) attacks that construct patches in different shapes and 4) adaptive attacks.
翻译:对图像分类深神经网络(DNNS)的Aversarial Adversarial 修补性攻击(DNNS),这种攻击在图像的封闭区域中造成任意扭曲,在图像的封闭区域中造成任意扭曲,可能产生强势(即在物理世界中保持对立)和普遍性(即仍然对任何投入保持对立)的对抗性扰动;这种攻击可能导致现实世界DNN的系统产生严重后果;这项工作提议Jujutssu,这是一种探测和减轻强大和普遍的对立性攻击的技术;为了侦测,Jujutsu利用这些攻击的普遍财产——Jujutsu首先定位潜在对立的区域,然后战略性地将其转移到一个专门区域,以新的图像确定它是否真正具有恶意。为了减轻攻击,Jujutsu,Jujuts 利用攻击的局部性质,通过图像来综合被攻击破坏的像素中的语义内容,并重建“干净”的形象。为了检测,我们从四个不同的数据集(ImageNet、Mimpeperte、CelibA和Place365)上评估了Jujujujujujusu,我们能够大大捍卫到攻击的更高程度攻击的系统。