Audio Event Detection (AED) Systems capture audio from the environment and employ some deep learning algorithms for detecting the presence of a specific sound of interest. In this paper, we evaluate deep learning-based AED systems against evasion attacks through adversarial examples. We run multiple security critical AED tasks, implemented as CNNs classifiers, and then generate audio adversarial examples using two different types of noise, namely background and white noise, that can be used by the adversary to evade detection. We also examine the robustness of existing third-party AED capable devices, such as Nest devices manufactured by Google, which run their own black-box deep learning models. We show that an adversary can focus on audio adversarial inputs to cause AED systems to misclassify, similarly to what has been previously done by works focusing on adversarial examples from the image domain. We then, seek to improve classifiers' robustness through countermeasures to the attacks. We employ adversarial training and a custom denoising technique. We show that these countermeasures, when applied to audio input, can be successful, either in isolation or in combination, generating relevant increases of nearly fifty percent in the performance of the classifiers when these are under attack.
翻译:音频事件探测系统(AED) 从环境中捕捉到音频,并使用一些深层次的学习算法来检测特定声音的存在。 在本文中,我们评估了深层次的以学习为基础的AED系统,以防止通过对抗性实例进行规避攻击。我们执行多种关键的AED安全任务,作为CNN的分类程序,然后利用两种不同类型的噪音,即背景和白色噪音,产生声对立实例,这些噪音可以被对手用来躲避探测。我们还研究了现有第三方的AED能力装置,如谷歌制造的Nest设备,它们运行着自己的黑盒深层学习模式。我们显示,对手可以专注于音频对立输入,导致AED系统错误分类,类似于以前通过侧重于图像领域的对抗性实例而开展的工作。然后,我们设法通过攻击的反措施提高分类人员的强度。我们采用对抗性培训和习惯消音技术。我们表明,这些用于音频输入的反措施可以成功,要么孤立地,要么结合起来,在攻击中产生近百分之五十的升级人员表现。