Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The FAD dataset is publicly available.
翻译:假音频探测是一个日益令人关切的问题,一些相关的数据集已经设计用于研究。但在添加噪音条件下,没有标准的中国公共数据集。在本文中,我们的目标是填补空白,设计中国假音频探测数据集(FAD),以研究更普遍的探测方法。使用12种主流语音生成技术生成假音频。模拟真实生活情景时,选择了3个噪音数据集,在5种不同的信号噪音比率上添加噪音。FAD数据集不仅可用于假音频探测,还可以用于探测音频法鉴定假话的算法。基线结果与分析一起提出。显示假音频探测方法并作一般性分析的结果仍然具有挑战性。FAD数据集是公开提供的。</s>