It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at $\href{https://github.com/asvspoof/D3M}{\text{https://github.com/asvspoof/D3M}}$.
翻译:由于高质量的回放设备的进步,现在迫切需要为脆弱的扬声器自动核查系统设计有效的反嘲笑算法。 因此,为了减少培训与推断之间的数据差异,我们建议D3M, 利用平衡的焦点损失功能作为培训目标,根据样本本身的特性,动态地衡量损失。此外,在实验中,我们选择了三种特征,既包含基于星级的信息,也包含基于阶段的信息,以形成互补和信息特性。ASVPOF2019的实验结果显示,通过比较我们的系统与表现最佳的系统,我们建议D3M, 利用平衡的焦点损失功能作为培训目标,根据样本本身的特性,动态地衡量损失。此外,我们选择了三种特征,既包含基于星级的信息,也包含基于阶段的信息,以形成互补和提供信息的特性。 ASVPOOF2019的实验结果显示拟议方法的优势,通过比较我们的系统与表现最佳状态的系统,我们做了更精确的 E5MDF5 的系统, 也分别进行了更精确地模拟了常规的系统 。