Finding synthetic artifacts of spoofing data will help the anti-spoofing countermeasures (CMs) system discriminate between spoofed and real speech. The Conformer combines the best of convolutional neural network and the Transformer, allowing it to aggregate global and local information. This may benefit the CM system to capture the synthetic artifacts hidden both locally and globally. In this paper, we present the transfer learning based MFA-Conformer structure for CM systems. By pre-training the Conformer encoder with different tasks, the robustness of the CM system is enhanced. The proposed method is evaluated on both Chinese and English spoofing detection databases. In the FAD clean set, proposed method achieves an EER of 0.04%, which dramatically outperforms the baseline. Our system is also comparable to the pre-training methods base on Wav2Vec 2.0. Moreover, we also provide a detailed analysis of the robustness of different models.
翻译:暂无翻译