In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
翻译:在过去几年里,人们发现深层次的学习系统在对抗性攻击中非常脆弱。神经网络自动语音识别(ASR)系统也不例外。定向和非定向攻击可以修改音频输入信号,使人类仍然能够识别相同的词,而ASR系统则被引导来预测不同的转录。在本文中,我们建议针对定向对抗性攻击建立防御机制,通过缓慢的特征分析、低通道过滤器或者在输入ASR系统之前,从音频信号中消除快速变化的特征。我们对经过培训的混合ASR模型进行了经验分析,这些模型以这种方式处理数据。虽然所产生的模型在良性数据上表现良好,但它们对定向对抗性攻击的力度要大得多。我们最后的拟议模型显示与基线模型类似的清洁数据的性能,而其强度是基线模型的四倍以上。