Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a microphone) has so far eluded researchers. We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech). We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets. We confirm the significant improvement with our approach by empirical comparison of the edit distance in a realistic over-the-air setting. Our approach states a significant step towards over-the-air attacks. We publish the code and an applicable implementation of our approach.
翻译:在图像处理领域首先观察到了这种攻击的可行性,但最近的研究表明,语音识别也容易受到对抗性攻击。然而,迄今为止,可靠的消除空隙(即通过麦克风记录对抗性例子)的工作一直没有被研究人员所利用。我们发现,由于生成过程的缺陷,最先进的对抗性例子生成方法由于目标语音识别系统(如Mozilla Deepspeech)的宾客操作而导致超标。我们设计了一种方法来减少这一缺陷,发现我们的方法用不同的冲值改进了对抗性例子的生成。我们确认,通过实证比较现实的超空环境中编辑距离,我们的方法取得了显著的改进。我们的方法指出,向超空攻击迈出了一大步。我们公布了代码,并适用了我们的方法。