Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck (VIB) for speech anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA) database shows that our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems. Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.
翻译:由文本到语音(TTS)或语音转换(VC)系统产生的精密合成语音系统最近的进展对现有自动语音核实系统造成威胁。由于这种合成语音来自多种算法,使用有限培训数据的一般化能力对于一个强大的防伪系统是不可或缺的。在这项工作中,我们提议了一个基于 wav2vec 2.0 预先培训模式的转让学习计划,该模式具有用于言论反对涂鸦任务的变异信息瓶颈(VIB ) 。对 ASVspoof 2019 逻辑访问数据库的评价表明,我们的方法改善了区分隐蔽和真实言论的性,超过了目前最先进的反涂鸦系统。此外,我们表明,拟议的系统大大改善了反涂鸦任务的低资源和交叉数据设置的性能,表明我们的系统在数据大小和数据分配方面也很健全。