Numerous recent studies have demonstrated how Deep Neural Network (DNN) classifiers can be fooled by adversarial examples, in which an attacker adds perturbations to an original sample, causing the classifier to misclassify the sample. Adversarial attacks that render DNNs vulnerable in real life represent a serious threat in autonomous vehicles, malware filters, or biometric authentication systems. In this paper, we apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier that we trained ourselves, to analyze transferability of this method. Next, we craft a variety of different black-box attack algorithms on a facial image dataset assuming minimal adversarial knowledge, to further assess the robustness of DNNs in facial recognition. While experimenting with different image distortion techniques, we focus on modifying single optimal pixels by a large amount, or modifying all pixels by a smaller amount, or combining these two attack approaches. While our single-pixel attacks achieved about a 15% average decrease in classifier confidence level for the actual class, the all-pixel attacks were more successful and achieved up to an 84% average decrease in confidence, along with an 81.6% misclassification rate, in the case of the attack that we tested with the highest levels of perturbation. Even with these high levels of perturbation, the face images remained identifiable to a human. Understanding how these noised and perturbed images baffle the classification algorithms can yield valuable advances in the training of DNNs against defense-aware adversarial attacks, as well as adaptive noise reduction techniques. We hope our research may help to advance the study of adversarial attacks on DNNs and defensive mechanisms to counteract them, particularly in the facial recognition domain.
翻译:最近许多研究都表明深神经网络(DNN)分类者如何被对抗性实例蒙骗,其中攻击者会增加原始样本的扰动,导致分类者对样本的分类错误。使DNN在真实生活中易受伤害的反向袭击对自主车辆、恶意软件过滤器或生物鉴别认证系统构成了严重威胁。在本文中,我们应用快速渐进信号方法,将扰动引入面部图像数据集,然后测试不同分类者的产出,以分析这一方法的可转移性。接下来,我们设计了多种不同的黑盒对面图像数据集进行攻击的算法,假设有最低限度的对抗性知识,从而进一步评估DNNNW在真实生活中的脆弱性。在试验不同的图像扭曲技术的同时,我们侧重于将所有像素修改成更小的数量,或者将这两种攻击方法结合起来。虽然我们所训练的面部的直径直线信号攻击平均降低15 %,但是在实际的分类中,所有黑盒图像的算算法率都降低了。