哈萨克TTS2:扩大开放源码的哈萨克 TTS公司,增加数据、演讲人和议题 (KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics)

We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified with the help of new sources, including a book and Wikipedia articles. This corpus is necessary for building high-quality TTS systems for Kazakh, a Central Asian agglutinative language from the Turkic family, which presents several linguistic challenges. We describe the corpus construction process and provide the details of the training and evaluation procedures for the TTS system. Our experimental results indicate that the constructed corpus is sufficient to build robust TTS models for real-world applications, with a subjective mean opinion score ranging from 3.6 to 4.2 for all the five speakers. We believe that our corpus will facilitate speech and language research for Kazakh and other Turkic languages, which are widely considered to be low-resource due to the limited availability of free linguistic data. The constructed corpus, code, and pretrained models are publicly available in our GitHub repository.

翻译：在哈萨克TTS2号新书中,总体规模从93小时增加到271小时,演讲者人数从2人增加到5人(3名女性和2名男性),在包括一本书和维基百科文章在内的新来源的帮助下,主题覆盖面已经多样化。这个材料对于为哈萨克人建立高质量的TTS系统是必要的,哈萨克语是来自突厥语家庭的一种中亚语言,具有几种语言挑战。我们描述了物质构建过程,并提供了TTS系统培训和评估程序的细节。我们的实验结果表明,已经建成的物质足以为现实世界应用建立健全的TTS模型,所有5位发言人的主观平均意见评分从3.6到4.2不等。我们认为,我们的材料将便利哈萨克语和其他土耳其语的言论和语言研究,由于自由语言数据有限,这些语言被广泛认为是低资源。我们GitHub储存库中公开提供了构建的宪法、法典和预设模型。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日