MHTTS:针对不完善的自发语音快速多头文字到语音 (MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription)

Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice.

翻译：基于神经网络端到端的文本到语音(TTS)大大提高了合成语音的质量。如何高效地使用大规模自发语音而不进行笔录仍然是一个尚未解决的问题。在本文中,我们建议采用快速多发式TTS系统,这是一个快速的多发式TTS系统,对抄录错误和语音风格语音数据具有很强的功能。具体地说,我们引入了多发式模型,并通过联合培训将高品质的文本用手工抄录方式转换成自发语音,而不尽人意的抄录。 MHTTS有三个优点:(1) 我们的系统以更快的推论速度合成质量更好的多发式语音。 (2) 我们的系统能够将正确文本信息转换为不完善的抄录、使用腐败模拟的或由自动语音识别器提供的数据。 (3) 我们的系统可以使用不完善的抄录和合成表达语音的大规模自发式语音。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日