Deep neural networks (DNNs) have been applied in a wide range of applications,e.g.,face recognition and image classification;however,they are vulnerable to adversarial examples.By adding a small amount of imperceptible perturbations,an attacker can easily manipulate the outputs of a DNN.Particularly,the localized adversarial examples only perturb a small and contiguous region of the target object,so that they are robust and effective in both digital and physical worlds.Although the localized adversarial examples have more severe real-world impacts than traditional pixel attacks,they have not been well addressed in the literature.In this paper,we propose a generic defense system called TaintRadar to accurately detect localized adversarial examples via analyzing critical regions that have been manipulated by attackers.The main idea is that when removing critical regions from input images,the ranking changes of adversarial labels will be larger than those of benign labels.Compared with existing defense solutions,TaintRadar can effectively capture sophisticated localized partial attacks, e.g.,the eye-glasses attack,while not requiring additional training or fine-tuning of the original model's structure.Comprehensive experiments have been conducted in both digital and physical worlds to verify the effectiveness and robustness of our defense.
翻译:深心神经网络(DNNs)被应用到广泛的应用领域,例如脸部识别和图像分类;然而,它们很容易受到对抗性实例的影响。 通过添加少量无法察觉的干扰,攻击者可以很容易地操纵DNN的输出。 特别是,局部对抗性例子只会干扰目标物体的小型和毗连区域,因此它们在数字和物理世界中都是强大和有效的。 虽然局部对抗性例子比传统的像素攻击对现实世界的影响要严重得多,但它们在文献中没有得到很好的处理。 在本文中,我们提议了一个称为TaintRadar的通用防御系统,通过分析攻击者操纵的关键区域来准确地探测局部对抗性例子。 主要的理念是,当将关键区域从输入图像中去除时,对抗性标签的等级变化将大于良性标签的等级变化。 与现有的防御解决办法相比,TaintRadar能够有效地捕捉到复杂的局部攻击,例如眼镜攻击,而无须再加培训或精确地校验世界原始防御结构的物理结构。