Automatic speech recognition (ASR) systems can be fooled via targeted adversarial examples, which induce the ASR to produce arbitrary transcriptions in response to altered audio signals. However, state-of-the-art adversarial examples typically have to be fed into the ASR system directly, and are not successful when played in a room. The few published over-the-air adversarial examples fall into one of three categories: they are either handcrafted examples, they are so conspicuous that human listeners can easily recognize the target transcription once they are alerted to its content, or they require precise information about the room where the attack takes place, and are hence not transferable to other rooms. In this paper, we demonstrate the first algorithm that produces generic adversarial examples, which remain robust in an over-the-air attack that is not adapted to the specific environment. Hence, no prior knowledge of the room characteristics is required. Instead, we use room impulse responses (RIRs) to compute robust adversarial examples for arbitrary room characteristics and employ the ASR system Kaldi to demonstrate the attack. Further, our algorithm can utilize psychoacoustic methods to hide changes of the original audio signal below the human thresholds of hearing. In practical experiments, we show that the adversarial examples work for varying room setups, and that no direct line-of-sight between speaker and microphone is necessary. As a result, an attacker can create inconspicuous adversarial examples for any target transcription and apply these to arbitrary room setups without any prior knowledge.
翻译:自动语音识别系统(ASR)可以通过有针对性的对抗性实例蒙骗到自动语音识别系统(ASR)系统,这些例子促使ASR产生针对变换音频信号的任意抄录。然而,最先进的对抗性实例通常必须直接输入ASR系统,而且当在一个房间里播放时并不成功。少数公开的超空对抗性实例属于三类:要么是手工制作的范例,它们非常显眼,因此人类听众一旦注意到内容,就可以很容易地识别目标抄录,或者他们需要关于袭击发生地点的确切信息,从而不能转移到其他房间。在本文件中,我们展示了产生通用对抗性实例的第一种算法,在不适应特定环境的超空攻击中仍然很强劲。因此,不需要事先了解房间特征。相反,我们使用室内冲动反应来为任意房间特征编译强有力的对抗性实例,并且使用ASR系统来演示攻击。此外,我们的算法可以使用心理分析方法来掩盖攻击发生地点,而无需隐藏原始的对抗性对抗性实例,作为原始的对抗性对抗性实例,作为我们所设定的直径标准,用于进行直截线式辩论的判断。