The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. Furthermore, for the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS. It learns deep speech representations while automatically learning and optimizing complex neural structures consisting of convolutional operations and residual blocks. The experimental results on the ASVspoof 2019 LA dataset show that our proposed system achieves an equal error rate (EER) of 1.08%, which outperforms the state-of-the-art single system.
翻译:现有的假音频探测系统往往依靠专家经验来设计声学特征或手动设计网络结构的超参数。 但是,人为调整参数可能对结果产生相对明显的影响。 几乎不可能手工设定最佳的参数组。 因此,本文件提出一个完全自动化的终端到端假音频探测方法。 我们首先使用 wav2vec 预先培训的模型来获得高层次的语音演示。 此外, 对于网络结构,我们使用一个修改版的可区别的建筑搜索(DARTS), 名为光- DARRTS。 它学习深层语音表达, 同时自动学习和优化由进化操作和残余区块组成的复杂神经结构。 ASVspoof 2019 LA数据集的实验结果显示,我们提议的系统达到1.08%的等差率,这超过了最先进的单一系统。