Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. While progress has been made in defending against imperceptible attacks, it remains unclear how patch-based attacks can be resisted. In this work, we study two different approaches for defending against black-box patch attacks. First, we show that adversarial training, which is successful against imperceptible attacks, has limited effectiveness against state-of-the-art location-optimized patch attacks. Second, we find that compositional deep networks, which have part-based representations that lead to innate robustness to natural occlusion, are robust to patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark, without adversarial training. Moreover, the robustness of compositional models outperforms that of adversarially trained standard models by a large margin. However, on GTSRB, we observe that they have problems discriminating between similar traffic signs with fine-grained differences. We overcome this limitation by introducing part-based finetuning, which improves fine-grained recognition. By leveraging compositional representations, this is the first work that defends against black-box patch attacks without expensive adversarial training. This defense is more robust than adversarial training and more interpretable because it can locate and ignore adversarial patches.
翻译:在这项工作中,我们研究了两种不同的防黑箱补丁攻击的方法。首先,我们表明,对抗最先进的地点偏好补丁攻击的对抗性训练效果有限,而这种训练在对付最先进的地点偏好补丁攻击方面是成功的。第二,我们发现,结构深度的网络,其部分基面表现导致自然封闭的内在稳健性,在弥补对PASACL3D+和德国交通标志识别基准的攻击方面仍然很强大,没有进行对抗性训练。此外,组成模型的稳健性比经对抗性训练的标准模式大得要强。然而,在GTSRB方面,我们发现,它们有区别性地区分类似的交通标志。我们通过采用部分基面调整来克服这一限制,从而改进了对自然隔离的稳健性认识。通过调控性化对PASACL3D+和德国交通标志识别基准进行修补攻击,而没有进行对抗性训练。此外,这种配置性模型的稳健性比对口面性训练更强,因为对口面性训练比对口面性训练更能性训练更能。