Audio DeepFakes are artificially generated utterances created using deep learning methods with the main aim to fool the listeners, most of such audio is highly convincing. Their quality is sufficient to pose a serious threat in terms of security and privacy, such as the reliability of news or defamation. To prevent the threats, multiple neural networks-based methods to detect generated speech have been proposed. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability mechanism) and enhancing it later by the use of adversarial training performed by our novel adaptive training method.
翻译:音频深藏器是使用深层学习方法人工生成的言论,其主要目的是欺骗听众,大多数音频都具有高度说服力,其质量足以在安全和隐私方面构成严重威胁,例如新闻或诽谤的可靠性。为防止威胁,提出了多种神经网络探测生成的言语的方法。在这项工作中,我们讨论了对抗性攻击问题,通过在输入数据中添加表面(难以辨别的人类)变化来降低探测器的性能。我们的贡献包括评价两种情景(白箱和使用可转移性机制)中3个对抗性攻击探测结构的强健性,以及后来通过使用我们新的适应性培训方法进行的对抗性培训来增强这种结构。