An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system could manipulate a person's speech signal to make it sound like another speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for speaker verification systems. In this paper, we investigate the problem of source speaker identification -- inferring the identity of the source speaker given the voice converted speech. To perform source speaker identification, we simply add voice-converted speech data with the label of source speaker identity to the genuine speech dataset during speaker embedding network training. Experimental results show the feasibility of source speaker identification when training and testing with converted speeches from the same voice conversion model(s). In addition, our results demonstrate that having more converted utterances from various voice conversion model for training helps improve the source speaker identification performance on converted utterances from unseen voice conversion models.
翻译:自动语音验证系统旨在核实语音信号的发言者身份,然而,语音转换系统可以操纵一个人的语音信号,使其听起来像另一个发言者的声音,并欺骗语音验证系统。大多数语音转换攻击的对策都旨在将善意言论与声音转换核查系统的伪言区分开来。在本文中,我们调查了源名识别问题 -- -- 推断源名发言者的身份,以声音转换发言。为了进行源名发言者身份识别,我们只需在语音嵌入网络培训中将语音转换数据与源名发言者身份标签加在真正的语音数据集中。实验结果显示,在用同一语音转换模式的语音转换演讲进行培训和测试时,源名发言者识别是可行的。此外,我们的结果显示,从各种语音转换模型转换出更多声音转换出来的意见有助于改进源名发言者在从看不见声音转换模式转换出来的语音转换语音时的识别表现。