有条件的变式自动编码器和为终端至终端文字到语音进行反向学习的文本到语音自动编码器 (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech)

Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts variational inference augmented with normalizing flows and an adversarial training process, which improves the expressive power of generative modeling. We also propose a stochastic duration predictor to synthesize speech with diverse rhythms from input text. With the uncertainty modeling over latent variables and the stochastic duration predictor, our method expresses the natural one-to-many relationship in which a text input can be spoken in multiple ways with different pitches and rhythms. A subjective human evaluation (mean opinion score, or MOS) on the LJ Speech, a single speaker dataset, shows that our method outperforms the best publicly available TTS systems and achieves a MOS comparable to ground truth.

翻译：最近提出了几个端到端文字语音模型(TTS),这些模型能够进行单阶段培训和平行取样,但其样本质量与两阶段TS系统不相匹配。在这项工作中,我们提出了一种平行端到端TTS方法,该方法产生比目前两阶段模型更自然的音频。我们的方法采用变式推论,通过正常流和对称培训过程来增强变异模型的表达力。我们还提出了一个随机时间预测器,用输入文本的不同节奏合成语音。随着对潜在变量和随机持续时间预测器的不确定性建模,我们的方法体现了一种自然的一至端至端关系,在这种关系中,文字输入可以用不同方向和节奏以多种方式进行。对LJ Session的主观人类评价(平均评分,即MOS),这是一个单一的语音数据集,它显示我们的方法超越了最佳的公开 TTS系统,并实现了与地面真理相近的MOS。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日