Recent years have witnessed the extraordinary development of automatic speaker verification (ASV). However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process. How to integrate the CM and ASV together remains an open question. A spoofing aware speaker verification (SASV) challenge has recently taken place with the argument that better performance can be delivered when both CM and ASV subsystems are optimized jointly. Under the challenge's scenario, the integrated systems proposed by the participants are required to reject both impostor speakers and spoofing attacks from target speakers, which intuitively and effectively matches the expectation of a reliable, spoofing-robust ASV system. This work focuses on fusion-based SASV solutions and proposes a multi-model fusion framework to leverage the power of multiple state-of-the-art ASV and CM models. The proposed framework vastly improves the SASV-EER from 8.75% to 1.17\%, which is 86% relative improvement compared to the best baseline system in the SASV challenge.
翻译:然而,以前的工作表明,最先进的ASV模式极易受到声音威胁的攻击,而最近提出的高性能的反制措施(CM)模式只侧重于独立反制任务,忽视了随后的发言者核查程序。如何将CM和ASV结合起来仍然是一个尚未解决的问题。最近出现了一个有意识的发言者核查(SASV)挑战,其理由是,如果MM和ASV子子系统同时得到优化,就能取得更好的效果。在挑战的情景下,参与者提议的综合系统必须拒绝冒名顶尖的发言者和来自目标发言者的攻击,这些系统直截了当、有效地与可靠、Spoof-robust ASV系统的期望相匹配。这项工作侧重于基于聚合的SASVV解决办法,并提出一个多模式的融合框架,以利用多州AVV和CM两个子系统的力量。拟议的框架将远比标准SAS-V系统改进了8.75至最佳的基线。拟议的框架将SAS-VER系统改进了86-%的基线。