ERNIE-SAT: 跨语言多语音多语种语音语音对语音的语音和文本联合预科培训 (ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech)

Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods.

翻译：语言语言教学提高了单一语言的语言理解和语言合成任务,然而,没有探讨其跨语言情景的能力。在本文中,我们扩展了跨语言多语种语言语言语言合成任务的培训前方法,包括跨语言多语种语音克隆和跨语言多语种语言语音编辑。我们提议了一个语言文本联合培训框架,其中我们随机遮盖了光谱和配有语音示例的电话及其抄录。通过学习以不同语言重建内容中隐蔽部分的内容,我们的模型展示了与基于语音的多语种语言语音合成技术方法相比的巨大改进。此外,我们的框架是培训和推断的端对端,不作任何微调。在跨语言多语种语言的语音克隆和跨语言多语种语音编辑任务中,我们的实验显示我们的模型超越了基于语言组合的多语种语音技术方法。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

语音识别:不同深度学习方法的综述，Speech Recognition: a review of the different deep learning approaches

专知会员服务

33+阅读 · 2022年3月13日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日