Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials. In practice, however, acoustic condition variability in speech utterances may significantly degrade the performance of CM systems. In this paper, we conduct a cross-dataset study on several state-of-the-art CM systems and observe significant performance degradation compared with their single-dataset performance. Observing differences of average magnitude spectra of bona fide utterances across the datasets, we hypothesize that channel mismatch among these datasets is one important reason. We then verify it by demonstrating a similar degradation of CM systems trained on original but evaluated on channel-shifted data. Finally, we propose several channel robust strategies (data augmentation, multi-task learning, adversarial learning) for CM systems, and observe a significant performance improvement on cross-dataset experiments.
翻译:假设反制(CM)系统在语音校验中至关重要;这些系统旨在辨别来自善意言语试验的虚假攻击;然而,在实践中,语音语句的声态变异可能显著降低CM系统的性能。在本文中,我们对一些最先进的CM系统进行交叉数据集研究,并观察到与单一数据集的性能相比,性能显著下降。观测到各数据集之间善意言论的平均规模差异,我们假设这些数据集之间频道不匹配是一个重要的原因。我们随后通过显示以原始数据培训但经频道转换数据评估的CM系统类似退化来核查它。最后,我们提出若干对CMM系统采取强有力的频道战略(数据扩增、多任务学习、对抗学习),并观察交叉数据集实验的显著性能改进。