PatchCleanser: 对任何图像分类的反逆补丁进行可验证的有力防御 (PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier)

The adversarial patch attack against image classification models aims to inject adversarially crafted pixels within a localized restricted image region (i.e., a patch) for inducing model misclassification. This attack can be realized in the physical world by printing and attaching the patch to the victim object and thus imposes a real-world threat to computer vision systems. To counter this threat, we propose PatchCleanser as a certifiably robust defense against adversarial patches that is compatible with any image classifier. In PatchCleanser, we perform two rounds of pixel masking on the input image to neutralize the effect of the adversarial patch. In the first round of masking, we apply a set of carefully generated masks to the input image and evaluate the model prediction on every masked image. If model predictions on all one-masked images reach a unanimous agreement, we output the agreed prediction label. Otherwise, we perform a second round of masking to settle the disagreement, in which we evaluate model predictions on two-masked images to robustly recover the correct prediction label. Notably, we can prove that our defense will always make correct predictions on certain images against any adaptive white-box attacker within our threat model, achieving certified robustness. We extensively evaluate our defense on the ImageNet, ImageNette, CIFAR-10, CIFAR-100, SVHN, and Flowers-102 datasets and demonstrate that our defense achieves similar clean accuracy as state-of-the-art classification models and also significantly improves certified robustness from prior works. Notably, our defense can achieve 83.8% top-1 clean accuracy and 60.4% top-1 certified robust accuracy against a 2%-pixel square patch anywhere on the 1000-class ImageNet dataset.

翻译：对抗图像分类模型的对抗性网状攻击的目的是在局部限制图像区域(即一个补丁)内注入对抗性设计的像素,以诱导模型错误分类。这种攻击可以在物理世界中通过打印和将补丁附加在受害者对象上来实现,从而给计算机视觉系统带来现实世界的威胁。为了应对这一威胁,我们建议PatchCleanser将PatchCleaner作为与任何图像分类兼容的对抗性网状的可靠防御。在PackVleaner中,我们在输入图像区域(即一个补丁)上安装两轮像素遮罩,以抵消对抗对对对对称的准确性的影响。在第一轮掩码中,我们将一套精心制作的面具用于对输入图像进行打印,并在每个遮掩面图像上进行模型的模拟。我们用两张的模型对稳妥度预测,我们可以证明我们最精确的网状的准确度,在S-IFAR图像中,我们用最精确的准确性模型,在S-IFAR模型中,我们最精确的升级的模型上,我们总能纠正对某张的S-RER的准确度预测。