Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.
翻译:随着语音辅助装置和服务的进步,语音技术越来越普遍。在辅助和替代性交流工具中使用语音合成,便利了使用语音障碍的个人与使用语音的周围环境进行交流。虽然对最通用世界语言来说有许多语音合成系统,但对较小语言的报价仍然有限。我们提出并比较了三种模型,这三种模型是使用在新录制的文体上受过马其顿培训的参数和深层次学习技术建造的。我们把低资源边缘部署用于辅助和替代性交流以及辅助技术,例如通信板和屏幕阅读器。听觉测试结果表明,与较先进的深层次学习模式相比,参数语音合成与表现一样。由于它需要的资源较少,而且提供完整的语音率和声控,因此为这一应用情景建造马其顿TTS系统更可取。