While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
翻译:虽然自动语音识别被证明很容易受到对抗性攻击,但针对这些攻击的防御仍然落后。现有的天真的防御可以通过适应性攻击部分地打破。在分类任务中,随机的平滑模式已证明在防御模型方面是有效的。然而,由于ASR任务的复杂性及其产出的相继性质,很难将这种模式应用于ASR任务。我们的论文克服了其中一些挑战,利用了特定语言工具,如增强和ROV投票来设计一个对扰动力很强的ASR模型。我们在模型中应用了最先进的攻击的适应性版本,如不可察觉的ASR攻击,并表明我们最强大的防御力量是对所有使用无端噪音的攻击的有力防御,并且只能以非常高度扭曲的方式打破。