Adversarial patch attacks mislead neural networks by injecting adversarial pixels within a designated local region. Patch attacks can be highly effective in a variety of tasks and physically realizable via attachment (e.g. a sticker) to the real-world objects. Despite the diversity in attack patterns, adversarial patches tend to be highly textured and different in appearance from natural images. We exploit this property and present PatchZero, a task-agnostic defense against white-box adversarial patches. Specifically, our defense detects the adversarial pixels and "zeros out" the patch region by repainting with mean pixel values. We formulate the patch detection problem as a semantic segmentation task such that our model can generalize to patches of any size and shape. We further design a two-stage adversarial training scheme to defend against the stronger adaptive attacks. We thoroughly evaluate PatchZero on the image classification (ImageNet, RESISC45), object detection (PASCAL VOC), and video classification (UCF101) datasets. Our method achieves SOTA robust accuracy without any degradation in the benign performance.
翻译:通过在指定地区内注射对抗性像素,对Aversarial adversarial Patch攻击会误导神经网络。 补丁攻击在各种任务中非常有效,并且可以通过对真实世界物体的附加(例如贴纸)实现。 尽管攻击模式的多样性,对称性攻击的补丁往往具有高度的纹理和与自然图像不同的外观。 我们利用这种财产并展示PatchZero,这是对抗白箱对抗性攻击的补丁的一种任务性、不可知性的防御。 具体地说,我们的防御用中等值重新油漆来探测对补丁区域的对抗性象素和“零化 ” 。 我们把补丁探测问题当作一种语义分解任务,这样我们的模型可以将之作为任何大小和形状的补丁。 我们进一步设计一个两阶段的对抗性攻击训练计划,以抵御更强烈的适应性攻击。 我们彻底评估图像分类( ImageNet, RESISAC45) 、 对象探测(PASCAL VOC) 和视频分类(UCF101) 数据集。 我们的方法在不退化的情况下实现了SOTATA的精确性。