Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set.
翻译:我们以前关于单级学习的研究通过压缩嵌入空间的善意演讲,提高了对隐性攻击的普及能力。然而,这种紧凑性缺乏对发言者多样性的考虑。在这项工作中,我们建议发言者吸引者多分级单级学习(SAMO),这种系统将一些发言者吸引者周围的善意讲话集中起来,并将所有吸引者在高空嵌入空间的恶言攻击推开。为了培训,我们建议了一种算法,用于共同优化善意演说组合和善意/诚心/诚心分类。为了推断,我们建议了一些战略,使没有录入的发言者能够反潜入。我们提议的系统超越了现有最先进的单级系统,相对改进了ASVspoof 2019 LA评价中关于相同误率的38%。