Speaker recognition has become very popular in many application scenarios, such as smart homes and smart assistants, due to ease of use for remote control and economic-friendly features. The rapid development of SRSs is inseparable from the advancement of machine learning, especially neural networks. However, previous work has shown that machine learning models are vulnerable to adversarial attacks in the image domain, which inspired researchers to explore adversarial attacks and defenses in Speaker Recognition Systems (SRS). Unfortunately, existing literature lacks a thorough review of this topic. In this paper, we fill this gap by performing a comprehensive survey on adversarial attacks and defenses in SRSs. We first introduce the basics of SRSs and concepts related to adversarial attacks. Then, we propose two sets of criteria to evaluate the performance of attack methods and defense methods in SRSs, respectively. After that, we provide taxonomies of existing attack methods and defense methods, and further review them by employing our proposed criteria. Finally, based on our review, we find some open issues and further specify a number of future directions to motivate the research of SRSs security.
翻译:发言人在许多应用情景中,例如智能之家和智能助手,由于易于用于远程控制和经济友好特征,对发言人的认可已变得非常普遍。SRS的迅速发展与机器学习的进展,特别是神经网络的发展密不可分。然而,先前的工作表明,机器学习模式在图像领域很容易受到对抗性攻击,这促使研究人员探索对抗性攻击和议长承认系统中的防御。不幸的是,现有的文献缺乏对这一专题的彻底审查。在本文件中,我们通过对SRS中的对抗性攻击和防御进行综合调查来填补这一空白。我们首先介绍了SRS的基础和与对抗性攻击有关的概念。然后,我们提出了两套标准,分别用来评价SRS中攻击方法和防御方法的性能。之后,我们用我们提出的标准对现有的攻击方法和防御方法进行分类,并进一步加以审查。最后,我们根据我们的审查,发现一些未决问题,并进一步确定未来的方向,以激励对SRS安全的研究。