学习到从文本中朗读: 零热多语种语言文本到语音, 不受监督的文本预科培训 (Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining)

While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource languages for which only textual resources are available, making TTS accessible to thousands of languages. Inspired by the strong cross-lingual transferability of multilingual language models, our framework first performs masked language model pretraining with multilingual text-only data. Then we train this model with a paired data in a supervised manner, while freezing a language-aware embedding layer. This allows inference even for languages not included in the paired data but present in the text-only data. Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language. All experiments were conducted using public datasets and the implementation will be made available for reproducibility.

翻译：虽然音质文本到语音(TTS)已经实现了人性化自然合成言语,多语言TTS系统由于需要配对文本和工作室质量的音频数据,局限于资源丰富的语言。本文件建议了一种方法,用于使用目标语言的纯文本数据,零点多语言TTS。只使用文本数据,就可以为只有文字资源的低资源语言开发TTS系统,使数千种语言能够使用TTS。在多种语言模式的强大跨语言可传输性激励下,我们的框架首先用多语言文本数据进行隐蔽语言模型预培训。然后,我们以监督的方式用配对数据来培训这个模型,同时冻结语言认知嵌入层。这允许人们推断即使在配对数据中不包括但只文本数据中存在的语言。评价结果显示,对于一种看不见的语言,其性能差率低于12%的零点TSTS。所有实验都是利用公共数据集进行的,实施过程将允许人们重新理解。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日