The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes. Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis approach, which is consistent with the zero-evidence concept of privacy. This approach significantly reduces the information in speech related to the speaker's sex while preserving speech content and some consistency in the resulting protected voices.
翻译:在分析/合成管道中使用现代电动代言人,使我们能够调查可用于隐私目的的高质量语音转换,在此,我们提议改造语音嵌入和音道,以掩盖发言者的性别;ECAPA-TDNN的语音代表被输入HiFiGAN vocoder, 使用神经分辨分析方法加以保护,这与隐私的零证据概念是一致的;这种方法大大减少了与发言者性别有关的演讲信息,同时保留了演讲内容,并在一定程度上保持了由此产生的受保护声音的一致性。