An automatic speaker verification system aims to verify the speaker identity of a speech signal. However, a voice conversion system manipulates the original person's speech signal to make it sound like the target speaker's voice and deceive the speaker verification system. Most countermeasures for voice conversion-based spoofing attacks are designed to discriminate bona fide speech from spoofed speech for speaker verification systems. In this paper, we investigate the problem of source speaker identification -- inferring the identity of the source speaker given the voice converted speech. To perform source speaker identification, we simply add voice-converted speech data with the label of source speaker identity to the genuine speech dataset during speaker embedding network training. Experimental results show the feasibility of source speaker identification when training and testing with converted speeches from the same voice conversion model(s). When testing on converted speeches from an unseen voice conversion algorithm, the performance of source speaker identification improves when more voice conversion models are used during training.
翻译:自动语音验证系统旨在核实语音信号的发言者身份,然而,语音转换系统操纵原发人语音信号,使其听起来像目标发言者的声音,并欺骗语音验证系统。大多数语音转换攻击的对策都旨在将善意言论与语音验证系统的伪言区分开来。在本文中,我们调查源发言者身份问题 -- -- 以语音转换的语音推断出源发言者的身份。为了进行源发言者身份识别,我们只需在语音嵌入网络培训中将原发人语音信号与源发言者身份标签加在真正的语音数据集中即可。实验结果显示,在用同一语音转换模型的转换发言进行培训和测试时,源发言者身份识别是可行的。在用无形语音转换算法对转换的转换发言进行测试时,如果在培训中使用更多语音转换模型,则源发言者身份表现会得到改善。