As the use of Voice Processing Systems (VPS) continues to become more prevalent in our daily lives through the increased reliance on applications such as commercial voice recognition devices as well as major text-to-speech software, the attacks on these systems are increasingly complex, varied, and constantly evolving. With the use cases for VPS rapidly growing into new spaces and purposes, the potential consequences regarding privacy are increasingly more dangerous. In addition, the growing number and increased practicality of over-the-air attacks have made system failures much more probable. In this paper, we will identify and classify an arrangement of unique attacks on voice processing systems. Over the years research has been moving from specialized, untargeted attacks that result in the malfunction of systems and the denial of services to more general, targeted attacks that can force an outcome controlled by an adversary. The current and most frequently used machine learning systems and deep neural networks, which are at the core of modern voice processing systems, were built with a focus on performance and scalability rather than security. Therefore, it is critical for us to reassess the developing voice processing landscape and to identify the state of current attacks and defenses so that we may suggest future developments and theoretical improvements.
翻译:由于越来越多地依赖商业语音识别装置和主要文本到语音软件等应用软件,语音处理系统的使用在我们日常生活中越来越普遍,对这些系统的攻击日益复杂、多样,并不断演变。随着语音处理系统的使用案件迅速发展到新的空间和目的,对隐私的潜在后果也越来越危险。此外,越发频繁和越发实用,使系统更有可能失灵。在本文件中,我们将确定和分类对语音处理系统的独特攻击安排。多年来,研究从导致系统失灵和服务被剥夺的专门、非有针对性的攻击转向更普遍的、有针对性的攻击,这些攻击可能迫使对手控制的结果。目前和最常用的机器学习系统和深神经网络是现代语音处理系统的核心,其建立的重点是性能和可缩放性,而不是安全性。因此,我们必须重新评估语音处理系统的发展状况,并查明当前攻击和防御状况,以便我们提出未来发展和理论改进的建议。