Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timbre and the linguistic content information from a speech signal. Once successful, voice conversion will be feasible and straightforward. This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. A new training method is applied to VQVC to separate content and timbre information from speech more effectively. The result shows that this approach has better performance than VQVC in separating content and timbre to improve the sound quality of generated speech.
翻译:语音转换( VC) 是指在保留演讲内容的同时改变演讲词的边角。 最近, 许多工作都侧重于分解的学习技巧, 以便将语调和语言内容信息与语音信号分离开来。 一旦成功, 语音转换将是可行和直截了当的。 本文提议了一个以矢量定量语音转换( VQVC) 和 AutoVC (AVQVC) 为基础的新型一次性语音转换框架。 对 VQVC 应用了新的培训方法, 以便更有效地将内容和语调信息从语音中分离开来。 结果显示, 这种方法比 VQVC 在将内容和语调区分开来提高声音的音调质量方面表现更好。