Deep neural network-based image classifications are vulnerable to adversarial perturbations. The image classifications can be easily fooled by adding artificial small and imperceptible perturbations to input images. As one of the most effective defense strategies, adversarial training was proposed to address the vulnerability of classification models, where the adversarial examples are created and injected into training data during training. The attack and defense of classification models have been intensively studied in past years. Semantic segmentation, as an extension of classifications, has also received great attention recently. Recent work shows a large number of attack iterations are required to create effective adversarial examples to fool segmentation models. The observation makes both robustness evaluation and adversarial training on segmentation models challenging. In this work, we propose an effective and efficient segmentation attack method, dubbed SegPGD. Besides, we provide a convergence analysis to show the proposed SegPGD can create more effective adversarial examples than PGD under the same number of attack iterations. Furthermore, we propose to apply our SegPGD as the underlying attack method for segmentation adversarial training. Since SegPGD can create more effective adversarial examples, the adversarial training with our SegPGD can boost the robustness of segmentation models. Our proposals are also verified with experiments on popular Segmentation model architectures and standard segmentation datasets.
翻译:深心网络图像分类很容易受到对抗性扰动。图像分类可以通过在输入图像中添加人工的小小和不易察觉的扰动而容易被愚弄。作为最有效的防御战略之一,提出了对抗性培训,以解决分类模型的脆弱性问题,在培训期间创建了对抗性实例并将其注入培训数据。过去几年来,对攻击和捍卫分类模型进行了密集研究。作为分类扩展的语义分割最近也受到极大关注。最近的工作表明,需要大量攻击性迭代来为愚昧分割模型创造有效的对抗性实例。观察使得对分离模型的激烈性评价和对抗性培训具有挑战性。在这项工作中,我们提出了一种高效和高效的分割攻击方法,即所谓的SegPGD。此外,我们提供了一种趋同分析,以显示拟议的SegPGD在攻击模式下可以创造比PGD更有效的对抗性范例。此外,我们提议将SegPGGD作为攻击性对抗性对抗性训练的基本方法。由于SegPGD可以创建更高效的对抗性评价模型,因此,SegPGD标准结构模型可以加强我们的对抗性模型。