Automatic speech recognition (ASR) systems are ubiquitously present in our daily devices. They are vulnerable to adversarial attacks, where manipulated input samples fool the ASR system's recognition. While adversarial examples for various English ASR systems have already been analyzed, there exists no inter-language comparative vulnerability analysis. We compare the attackability of a German and an English ASR system, taking Deepspeech as an example. We investigate if one of the language models is more susceptible to manipulations than the other. The results of our experiments suggest statistically significant differences between English and German in terms of computational effort necessary for the successful generation of adversarial examples. This result encourages further research in language-dependent characteristics in the robustness analysis of ASR.
翻译:自动语音识别系统(ASR)在日常设备中普遍存在,很容易受到对抗性攻击,在这种攻击中,被操纵的输入样本会愚弄ASR系统的识别。虽然已经对各种英文ASR系统的对抗性实例进行了分析,但还没有语言间比较脆弱性分析。我们比较了德语和英语ASR系统的可攻击性,以Deepspeech为例。我们调查了一种语言模式是否比另一种模式更容易被操纵。我们的实验结果表明,在计算成功生成对抗性实例所需的计算努力方面,英语和德语在统计上存在显著差异。这一结果鼓励在对ASR的稳健性分析中进一步研究依赖语言的特点。