关于发展神经文字对语音系统示范培训前培训的功效的研究 (A study on the efficacy of model pre-training in developing neural text-to-speech system)

In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data. This study aims to understand better why and how model pre-training can positively contribute to TTS system performance. It is postulated that the pre-training process plays a critical role in learning text-related variation in speech, while further training with the target speaker's data aims to capture the speaker-related variation. Different test sets are created with varying degrees of similarity to target speaker data in terms of text content. Experiments show that leveraging a speaker-independent TTS trained on speech data with diverse text content can improve the target speaker TTS on domain-mismatched text. We also attempt to reduce the amount of pre-training data for a new text domain and improve the data and computational efficiency. It is found that the TTS system could achieve comparable performance when the pre-training data is reduced to 1/8 of its original size.

翻译：在开发神经文本到语音系统的过程中,使用大量非目标发言者数据进行示范预培训是一种共同的做法,但是,就最终达到的目标发言者系统性能而言,示范预培训的实际效益不确定和不稳定,在很大程度上取决于培训数据的数量和文字内容。本研究报告旨在更好地了解为什么和如何示范预培训能够积极促进TTS系统的性能。本研究报告假定,培训前过程在学习与文本有关的言论变异方面发挥着关键作用,而进一步培训目标发言者数据的目的是捕捉与发言者有关的变异。制作不同的测试组在文本内容方面与目标发言者数据具有不同程度的相似性。实验表明,利用受过不同文字内容语言数据培训的讲者TTS可以改进关于域式超文本文本的TTS。我们还试图减少新的文本域的培训前数据量,并改进数据和计算效率。发现,在培训前数据缩小到原规模1/8时,TTS系统可以实现可比较的性能。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

因果知识图谱自然语言理解

专知会员服务

81+阅读 · 2021年7月3日

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日