具有自我监督功能的强力零热声音转换模型培训 (Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features)

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation has been shown to produce useful linguistic units without using transcripts, which can be directly passed to a VC model. In this paper, we showed that high-quality audio samples can be achieved by using a length resampling decoder, which enables the VC model to work in conjunction with different linguistic feature extractors and vocoders without requiring them to operate on the same sequence length. We showed that our method can outperform many baselines on the VCTK dataset. Without modifying the architecture, we further demonstrated that a) using pairs of different audio segments from the same speaker, b) adding a cycle consistency loss, and c) adding a speaker classification loss can help to learn a better speaker embedding. Our model trained on LibriTTS using these techniques achieves the best performance, producing audio samples transferred well to the target speaker's voice, while preserving the linguistic content that is comparable with actual human utterances in terms of Character Error Rate.

翻译：无人监督的零热声音变换(VC)旨在修改一个语句的语句特征,使其在不依赖平行培训数据的情况下能够匹配一个看不见的目标发言者。最近,自我监督的语音表达学习显示,在没有使用记录誊本的情况下产生了有用的语言单位,可以直接传递到VC模式。在本文中,我们表明,通过使用长长的重新采样解码器,可以实现高质量的音频样本质量,使VC模型能够与不同语言特征提取器和电弧器一起工作,而不必要求它们在同一序列长度上操作。我们表明,我们的方法可以超越VCTK数据集的许多基线。我们进一步表明,在不修改结构的情况下,a)使用同一发言者的不同音频段配对,b)增加周期一致性损失,c)增加一个发言者分类损失,可以帮助学习更好的发言者嵌入。我们在LibriTTTS系统上培训的模型取得了最佳性能,生成的音频样样本可以传送到目标发言者的语音,同时保留语言内容与实际的人端错误率相当。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日