With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
 翻译:随着深层学习的进步,发言人的核查取得了非常高的准确性,并正在我们日常生活的许多场景中,特别是日益扩大的网络服务市场中,作为一种生物鉴别认证选择,成为一种生物鉴别认证选择。与传统的密码相比,“vocal guns”更方便得多,因为这样可以使人们省去对不同密码的记忆。然而,新的机器学习攻击使这些语音认证系统处于危险之中。如果没有强有力的安全保障,攻击者可以通过愚弄深层神经网络(DNN)的语音识别模型进入合法用户的网络账户。在本文中,我们展示了对声音认证系统的一种易于执行的数据中毒攻击,而现有的防御机制几乎无法捕捉到这种攻击。因此,我们提出了一种更强有力的防御方法,称为Guardian(Guardian),这是一个以神经网络为基础的共振荡式歧视者。守护者将一系列新技术整合在一起,包括减少偏差、增加输入和连通性学习。我们的方法能够将95%的被攻击账户与正常账户区分开来,而正常账户比现有方法要有效得多,只有60%的精确度。