利用动态语言和体歌嵌入式,改进双语TTS (Improve Bilingual TTS Using Dynamic Language and Phonology Embedding)

In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language. Therefore, it is a big challenge to accurately model the pronunciation and intonation of the second language in different contexts without mutual interference. This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker. We introduce phonology embedding to capture the English differences between different phonology. Embedding mask is applied to language embedding for distinguishing information between different languages and to phonology embedding for focusing on English expression. We specially design an embedding strength modulator to capture the dynamic strength of language and phonology. Experiments show that our approach can produce significantly more natural and standard spoken English speech of the monolingual Chinese speaker. From analysis, we find that suitable phonology control contributes to better performance in different scenarios.

翻译：在多数情况下,双语TTS需要处理三种输入文字:第一语言只,第二语言只,第二语言嵌入第一语言。在后两种情况下,第二语言的发音和内化通常由于第一语言的影响而有很大不同。因此,精确地模拟不同情况下的第二语言发音和内化是巨大的挑战,没有相互干扰。本文建立了一个普通话-英语TTS系统,以便从一个单语中文演讲者那里获得更标准的英语口语。我们引入声学嵌入,以捕捉不同声调之间的英语差异。在后两种情况下,第二语言的发音和内化通常由于第一语言的影响而有很大不同。我们专门设计了一种嵌入力调节器,以捕捉语言和声调的动态强度。实验表明,我们的方法能够产生更自然和更标准的单语中文演讲者英语。我们从分析中发现,适当的声调控制有助于在不同情景下更好的表现。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日