In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal. We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline. The denoiser is independent of the downstream ASR model, and thus can be rapidly deployed in existing systems. We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness without compromising ASR performance on benign samples. Our defense was evaluated (as a part of the DARPA GARD program) on the 'Kenansville' attack strategy across a range of attack strengths and speech samples. An average improvement in Word Error Rate (WER) of about 7.7% was observed over the undefended model at 20 dB signal-to-noise-ratio (SNR) attack strength.
翻译:在本文中,我们调查对自动语音识别系统(ASR)的对抗性攻击进行言辞失效的防御。反向攻击试图通过在最初的语音信号中增加小扰动来强迫错误分类。我们提议通过使用以神经网络为基地的解诺器作为ASR管道的预处理器来抵消这一点。解诺器独立于下游的ASR模型,因此可以在现有系统中迅速部署。我们发现,使用一种感知性的丢失功能来训练denoisier导致在不损害良性样本中ASR性能的情况下,增强对抗性强势。我们(作为DARPA GARD方案的一部分)在“Kenansville”攻击战略中,在一系列攻击力和语音样本中进行了评估。在20 dB 信号到神经拉特(SNR)攻击力上,观察到了约7.7%的无防御性模型上值平均差速率的改善。