Audio DeepFakes allow the creation of high-quality, convincing utterances and therefore pose a threat due to its potential applications such as impersonation or fake news. Methods for detecting these manipulations should be characterized by good generalization and stability leading to robustness against attacks conducted with techniques that are not explicitly included in the training. In this work, we introduce Attack Agnostic Dataset - a combination of two audio DeepFakes and one anti-spoofing datasets that, thanks to the disjoint use of attacks, can lead to better generalization of detection methods. We present a thorough analysis of current DeepFake detection methods and consider different audio features (front-ends). In addition, we propose a model based on LCNN with LFCC and mel-spectrogram front-end, which not only is characterized by a good generalization and stability results but also shows improvement over LFCC-based mode - we decrease standard deviation on all folds and EER in two folds by up to 5%.
翻译:音频深面图允许创建高质量、令人信服的言词,从而由于其潜在的应用,例如假冒或假新闻,从而构成威胁。检测这些操纵的方法应具有良好的一般化和稳定性,从而能够有力地对付使用培训中没有明确包括的技术进行的攻击。在这项工作中,我们引入了攻击性Agnotistic数据集 -- -- 由两个音频深面图和一个反伪数据集相结合,由于袭击的不相干使用,可能导致探测方法的更普遍化。我们提出了对当前DeepFake探测方法的透彻分析,并考虑了不同的音频特征(前端)。此外,我们提出了一个基于LCNN的模型,与LFCC和Mel-pectrogrogram Frond,不仅具有良好的一般化和稳定性效果,而且还表明与LFCC模式相比的改进 -- -- 我们将所有折叠和ER的标准偏差降低至5%。