Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping. We propose a method to camouflage a person's voice over-the-air from these systems without inconveniencing the conversation between people in the room. Standard adversarial attacks are not effective in real-time streaming situations because the characteristics of the signal will have changed by the time the attack is executed. We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future. Under real-time constraints, our method jams the established speech recognition system DeepSpeech 4.17x more than baselines as measured through word error rate, and 7.27x more as measured through character error rate. We furthermore demonstrate our approach is practically effective in realistic environments over physical distances.
翻译:自动语音识别系统为应用程序创造了令人兴奋的可能性,但它们也为系统窃听提供了机会。我们建议一种方法,在不干扰室内人员之间对话的情况下,将一个人的声音从这些系统上隐蔽起来。标准对抗性攻击在实时流中并不有效,因为信号的特性在攻击执行时会发生变化。我们引入了预测性攻击,通过预测攻击实现实时性能,而预测性能将是未来最有效的。在实时限制下,我们的方法干扰了已经建立的语音识别系统DeepSpeech 4.17x超过通过字差率测量的基线,而7.27x高于通过字符错误率测量的基线。我们进一步表明,我们的方法在现实环境中是有效的,超过实际距离。