ClARTTS: 阿拉伯古典开放源码文字语音公司</s> (ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus)

At present, Text-to-speech (TTS) systems that are trained with high-quality transcribed speech data using end-to-end neural models can generate speech that is intelligible, natural, and closely resembles human speech. These models are trained with relatively large single-speaker professionally recorded audio, typically extracted from audiobooks. Meanwhile, due to the scarcity of freely available speech corpora of this kind, a larger gap exists in Arabic TTS research and development. Most of the existing freely available Arabic speech corpora are not suitable for TTS training as they contain multi-speaker casual speech with variations in recording conditions and quality, whereas the corpus curated for speech synthesis are generally small in size and not suitable for training state-of-the-art end-to-end models. In a move towards filling this gap in resources, we present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz. In this paper, we describe the process of corpus creation and provide details of corpus statistics and a comparison with existing resources. Furthermore, we develop two TTS systems based on Grad-TTS and Glow-TTS and illustrate the performance of the resulting systems via subjective and objective evaluations. The corpus will be made publicly available at www.clartts.com for research purposes, along with the baseline TTS systems demo.

翻译：目前,通过使用端到端神经模型进行高质量转录语音数据培训的文本到语音系统(TTS)目前,通过使用端到端神经模型进行高品质调音数据培训的文本到语音系统,可以产生易感性、自然和与人的语言非常相似的语音。这些模型经过相对大型的单声频专业录音培训,通常从声频书中提取。与此同时,由于缺少这种类型的免费语音公司,阿拉伯文TTS研究和发展中存在着更大的差距。现有的可自由获取的阿拉伯语语音公司大多数不适合TTS培训,因为它们包含多声频临时发言,在记录条件和质量方面各有差异,而为语音合成而整理的剧本一般规模较小,不适合培训最高级的、最专业的、最专业的、最专业的、最高级的音频模式。为了填补这一资源缺口,我们为经典的阿拉伯语文本到Speople(CLARTTS)提供了一套语音和现有TTTS系统。演讲将来自LibriVox 音频系统,然后从一个LiriVox 音盘上进行翻译和最后版本的版本的翻译,然后在12小时的STRTTTTTS 上进行整理,然后进行。</s>

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日