Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and trained voices for 12 African languages to support researchers and developers.
翻译:现代语言合成技术可以在足够高质量的数据和计算资源的情况下产生自然声音。然而,这些数据对于许多语言来说并不易获得。本文侧重于资源不足的非洲语言的语音合成,从剧本生成到共享和部署文本到语音系统。我们首先建立一套通用指令,用最起码的技术资源和主题事项专门知识来建立语音合成系统。接下来,我们通过参与性方法,从“已找到的”数据(现有录音)中创建新的数据集并整理数据集(现有录音),同时考虑无障碍性、质量和广度。我们证明我们可以开发合成器,以25分钟的创制语音生成可感应的语音,即便在亚最佳环境中记录。最后,我们发布12种非洲语言的语音数据、代码和培训声音,以支持研究人员和开发者。