Adversarial patch attacks mislead neural networks by injecting adversarial pixels within a local region. Patch attacks can be highly effective in a variety of tasks and physically realizable via attachment (e.g. a sticker) to the real-world objects. Despite the diversity in attack patterns, adversarial patches tend to be highly textured and different in appearance from natural images. We exploit this property and present PatchZero, a general defense pipeline against white-box adversarial patches without retraining the downstream classifier or detector. Specifically, our defense detects adversaries at the pixel-level and "zeros out" the patch region by repainting with mean pixel values. We further design a two-stage adversarial training scheme to defend against the stronger adaptive attacks. PatchZero achieves SOTA defense performance on the image classification (ImageNet, RESISC45), object detection (PASCAL VOC), and video classification (UCF101) tasks with little degradation in benign performance. In addition, PatchZero transfers to different patch shapes and attack types.
翻译:局部地区通过注射对抗性象素,使神经网络误入反向补丁。 补丁攻击在各种任务中非常有效,并且可以通过对真实世界物体的附加物(例如贴纸)实现。 尽管攻击模式多种多样,对抗性补丁往往具有高度的纹理和与自然图像不同的外观。 我们利用这种特性并展示了帕奇泽罗,这是对抗白箱对抗性对立补丁的一般防御管道,没有再培训下游分类器或探测器。 具体地说,我们的防卫通过用中等值重新油漆来探测补丁区域的对手和“零”。 我们还设计了两阶段对抗性对抗性训练计划,以抵御更强的适应性攻击。 帕奇泽罗在图像分类( ImageNet, RESISC45) 、 物体探测( PASCAL VOC) 和视频分类( UCF101) 任务中,其良性效果不差。此外, PatchZero还向不同的补丁形状和攻击类型转移。