In recent years, significant progress has been made in deep model-based automatic speech recognition (ASR), leading to its widespread deployment in the real world. At the same time, adversarial attacks against deep ASR systems are highly successful. Various methods have been proposed to defend ASR systems from these attacks. However, existing classification based methods focus on the design of deep learning models while lacking exploration of domain specific features. This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection. Furthermore, the paper analyses the potentials of using speech and non-speech parts separately in detecting adversarial attacks. In the end, considering adverse environments where ASR systems may be deployed, we study the impact of acoustic noise of various types and signal-to-noise ratios. Extensive experiments show that the inverse filter bank features generally perform better in both clean and noisy environments, the detection is effective using either speech or non-speech part, and the acoustic noise can largely degrade the detection performance.
翻译:近年来,在深建模型自动语音识别(ASR)方面取得了显著进展,导致其在现实世界的广泛部署;同时,对深建ASR系统的对抗性攻击非常成功;提出了各种保护ASR系统免遭这些攻击的方法;然而,现有的基于分类的方法侧重于深造模式的设计,同时缺乏对具体领域特征的探索;这项工作利用银行的过滤功能更好地捕捉攻击的特征,以改进探测;此外,文件分析了在发现对抗性攻击时分别使用语音和非语音部分的可能性;最后,考虑到可能部署ASR系统的不利环境,我们研究了各种类型声音和信号到噪音比率的影响;广泛的实验表明,在清洁和噪音环境中,反过滤银行特征一般表现更好,检测使用语音或非语音部分是有效的,声音噪音可在很大程度上降低探测性能。