We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.
翻译:在自动语音识别方面,我们构建了有针对性的声对立示例。在任何音波形式下,我们可以产生另一个类似超过99.9%的音频波,但作为我们选择的词组(以每秒50个字符的速度计算 ), 我们用迭代优化式攻击对Mozilla的 DeepSpeech 终端到终端进行实施, 并显示其100%的成功率。 这次攻击的可行性为研究对抗性例子引入了一个新的领域 。