通往基于查询音频对抗攻击的通用防御方案 (Towards the Universal Defense for Query-Based Audio Adversarial Attacks)

Recently, studies show that deep learning-based automatic speech recognition (ASR) systems are vulnerable to adversarial examples (AEs), which add a small amount of noise to the original audio examples. These AE attacks pose new challenges to deep learning security and have raised significant concerns about deploying ASR systems and devices. The existing defense methods are either limited in application or only defend on results, but not on process. In this work, we propose a novel method to infer the adversary intent and discover audio adversarial examples based on the AEs generation process. The insight of this method is based on the observation: many existing audio AE attacks utilize query-based methods, which means the adversary must send continuous and similar queries to target ASR models during the audio AE generation process. Inspired by this observation, We propose a memory mechanism by adopting audio fingerprint technology to analyze the similarity of the current query with a certain length of memory query. Thus, we can identify when a sequence of queries appears to be suspectable to generate audio AEs. Through extensive evaluation on four state-of-the-art audio AE attacks, we demonstrate that on average our defense identify the adversary intent with over 90% accuracy. With careful regard for robustness evaluations, we also analyze our proposed defense and its strength to withstand two adaptive attacks. Finally, our scheme is available out-of-the-box and directly compatible with any ensemble of ASR defense models to uncover audio AE attacks effectively without model retraining.

翻译：近期的研究表明，基于深度学习的自动语音识别（ASR）系统易受到对抗样本（AE）的攻击，这些攻击向原始音频样本中添加少量噪声。这些AE攻击给深度学习安全带来了新的挑战，并引起了部署ASR系统和设备的重大关切。现有的防御方法要么在应用上有限，要么只能防御结果，而不能防御过程。在本文中，我们提出了一种新的方法，以推断对手意图并发现基于查询的音频对抗性例子为基础，推导出AE生成过程的方法。这种方法的基本思路是基于这样的观察结果：许多现有的音频AE攻击采用基于查询的方法，这意味着在音频AE生成过程中，对手必须向目标ASR模型发送连续且相似的查询。受到这种观察的启发，我们采用音频指纹技术提出了一种记忆机制，以分析当前查询与一定长度的记忆查询之间的相似度。因此，我们可以确定一连串查询是否可能生成音频AE。通过对四种最先进的音频AE攻击的广泛评估，我们证明了我们的防御平均识别对手意图的准确率超过90%。在关注稳健性评估方面，我们还分析了我们提出的防御方法及其抵抗两种自适应攻击的能力。最后，我们的方案可以直接与任何ASR防御模型组合使用，无需重训练即可有效揭示音频AE攻击。