提高有益扰动特征增强对人脸识别的对抗攻击可迁移性 (Improving the Transferability of Adversarial Attacks on Face Recognition with Beneficial Perturbation Feature Augmentation)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Face recognition (FR) models can be easily fooled by adversarial examples, which are crafted by adding imperceptible perturbations on benign face images. The existence of adversarial face examples poses a great threat to the security of society. In order to build a more sustainable digital nation, in this paper, we improve the transferability of adversarial face examples to expose more blind spots of existing FR models. Though generating hard samples has shown its effectiveness in improving the generalization of models in training tasks, the effectiveness of utilizing this idea to improve the transferability of adversarial face examples remains unexplored. To this end, based on the property of hard samples and the symmetry between training tasks and adversarial attack tasks, we propose the concept of hard models, which have similar effects as hard samples for adversarial attack tasks. Utilizing the concept of hard models, we propose a novel attack method called Beneficial Perturbation Feature Augmentation Attack (BPFA), which reduces the overfitting of adversarial examples to surrogate FR models by constantly generating new hard models to craft the adversarial examples. Specifically, in the backpropagation, BPFA records the gradients on pre-selected feature maps and uses the gradient on the input image to craft the adversarial example. In the next forward propagation, BPFA leverages the recorded gradients to add beneficial perturbations on their corresponding feature maps to increase the loss. Extensive experiments demonstrate that BPFA can significantly boost the transferability of adversarial attacks on FR.

翻译：人脸识别模型很容易被添加微小扰动的对抗样本愚弄。对抗人脸样本的存在对社会安全构成了巨大威胁。为了建设更加可持续的数字国家，在本文中，我们提高了对抗人脸样本的可迁移性，以暴露现有人脸识别模型更多的盲点。尽管生成难样本已经显示出其在提高模型在训练任务中的泛化性能方面的有效性，但是利用这个想法来提高对抗人脸样本的可迁移性的有效性仍未被探索。为此，基于难样本的属性和训练任务和对抗攻击任务之间的对称性，我们提出了硬模型的概念，硬模型具有类似于对抗攻击任务的难样本的效果。利用硬模型的概念，我们提出了一种新的攻击方法叫做 Beneficial Perturbation Feature Augmentation Attack (BPFA)，通过不断生成新的硬模型来制作对抗性样本，从而减少对抗性样本对代理人脸模型的过拟合。具体而言，在反向传播中，BPFA记录预选功能映射上的梯度，并使用输入图像上的梯度来制作对抗性样本。在下一次正向传播中，BPFA利用记录的梯度在相应的功能映射上添加有益的扰动以增加损失。大量实验证明，BPFA可以显著提高对人脸识别的对抗攻击可迁移性。