The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes. Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis approach, which is consistent with the zero-evidence concept of privacy. This approach significantly reduces the information in speech related to the speaker's sex while preserving speech content and some consistency in the resulting protected voices.
翻译:使用现代语音编解码器在分析/合成流水线中的研究,允许我们研究可以用于隐私保护的高质量语音转换。在这里,我们提出了一种转换讲话者嵌入和音调的方法,以隐藏讲话者的性别。ECAPA-TDNN讲话者表示法送入HiFiGAN编解码器中使用神经鉴别分析方法进行保护,这与隐私的零证据概念是一致的。这种方法显着降低了与讲话者性别相关的语音信息,同时保留了语音内容和一些一致性的保护语音。