Deep neural networks (DNNs) have been applied in a wide range of applications,e.g.,face recognition and image classification; however,they are vulnerable to adversarial examples. By adding a small amount of imperceptible perturbations,an attacker can easily manipulate the outputs of a DNN. Particularly,the localized adversarial examples only perturb a small and contiguous region of the target object,so that they are robust and effective in both digital and physical worlds. Although the localized adversarial examples have more severe real-world impacts than traditional pixel attacks,they have not been well addressed in the literature. In this paper,we propose a generic defense system called TaintRadar to accurately detect localized adversarial examples via analyzing critical regions that have been manipulated by attackers. The main idea is that when removing critical regions from input images,the ranking changes of adversarial labels will be larger than those of benign labels. Compared with existing defense solutions,TaintRadar can effectively capture sophisticated localized partial attacks, e.g.,the eye-glasses attack,while not requiring additional training or fine-tuning of the original model's structure. Comprehensive experiments have been conducted in both digital and physical worlds to verify the effectiveness and robustness of our defense.
翻译:深心神经网络(DNNs)被应用到广泛的应用领域,例如,脸部识别和图像分类;然而,它们容易成为对抗性的例子。通过添加少量的不可察觉的扰动,攻击者可以很容易地操纵DNN的输出。特别是,局部对抗性例子只会干扰目标物体的小型和毗连区域,因此在数字和物理世界中,它们都是强大和有效的。虽然局部对抗性例子比传统的像素攻击具有更严重的现实世界影响,但它们在文献中没有得到很好的处理。在本文中,我们提议建立一个称为TaintRadar的通用防御系统,通过分析攻击者操纵的关键区域来准确地探测局部对抗性例子。主要的想法是,在将关键区域从输入图像中去除时,对抗性标签的排序变化将大于良性标签。与现有的防御解决方案相比,TaintRadar能够有效地捕捉到复杂的局部攻击,例如,眼镜攻击,同时不需要对原始防御模型的可靠性进行额外培训或微调。全面试验,以世界的物理和精确性核实。