Meta-Voice: 使用元学习进行语音克隆的快速短镜头风格传输 (Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning)

The task of few-shot style transfer for voice cloning in text-to-speech (TTS) synthesis aims at transferring speaking styles of an arbitrary source speaker to a target speaker's voice using very limited amount of neutral data. This is a very challenging task since the learning algorithm needs to deal with few-shot voice cloning and speaker-prosody disentanglement at the same time. Accelerating the adaptation process for a new target speaker is of importance in real-world applications, but even more challenging. In this paper, we approach to the hard fast few-shot style transfer for voice cloning task using meta learning. We investigate the model-agnostic meta-learning (MAML) algorithm and meta-transfer a pre-trained multi-speaker and multi-prosody base TTS model to be highly sensitive for adaptation with few samples. Domain adversarial training mechanism and orthogonal constraint are adopted to disentangle speaker and prosody representations for effective cross-speaker style transfer. Experimental results show that the proposed approach is able to conduct fast voice cloning using only 5 samples (around 12 second speech data) from a target speaker, with only 100 adaptation steps. Audio samples are available online.

翻译：在文本到语音合成(TTS)中为语音克隆进行微小风格传输的任务,是利用非常有限的中性数据,将任意源发言者的语音风格转换成目标发言者的语音风格,这是一项非常富有挑战性的任务,因为学习算法需要同时处理微小声音克隆和语音-外观脱钩。在现实世界应用中,加快新目标发言者的适应进程很重要,但甚至更具挑战性。在本文中,我们处理使用元学习为语音克隆任务进行硬性快速微小风格传输的问题。我们调查了模型-敏感元学习(MAML)算法和元传输一个经过预先训练的多语音和多质基 TTS 模型,以便能对少数样本的适应性高度敏感。在现实世界应用中,对新目标发言者的调控机制以及矩约束是解开音器和Prosody演示,以有效跨语音风格传输。实验结果显示,拟议方法只能使用5个样本(大约12个第二语音数据)进行快速语音克隆。只有100个音频样品可在线应用。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日