We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size is increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage is diversified with the help of new sources, including a book and Wikipedia articles. This corpus is necessary for building high-quality TTS systems for Kazakh, a Central Asian agglutinative language from the Turkic family, which presents several linguistic challenges. We describe the corpus construction process and provide the details of the training and evaluation procedures for the TTS system. Our experimental results indicate that the constructed corpus is sufficient to build robust TTS models for real-world applications, with a subjective mean opinion score of above 4.0 for all the five speakers. We believe that our corpus will facilitate speech and language research for Kazakh and other Turkic languages, which are widely considered to be low-resource due to the limited availability of free linguistic data. The constructed corpus, code, and pretrained models are publicly available in our GitHub repository.
翻译:在哈萨克TTS2号新系统中,总体规模从93小时增加到271小时,发言者人数从2人增加到5人(3名女性和2名男性),在新来源的帮助下,包括一本书和维基百科文章的帮助下,主题覆盖面也多样化了。这个平台对于为哈萨克人建立高质量的TTS系统是必要的,哈萨克语是来自突厥语家庭的一种中亚混杂语言,它提出了几种语言挑战。我们描述了物质构建过程,并提供了TTS系统培训和评估程序的细节。我们的实验结果表明,已经建成的物质足以为现实应用建立强大的TTS模型,所有5位发言人的主观平均评分超过4.0分。我们认为,我们的材料将便利哈萨克语和其他土耳其语的言论和语言研究,由于自由语言数据有限,这些语言被广泛认为是低资源。我们GitHub储存库中公开了构建的建筑材料、代码和预先培训模型。