Developing speech technologies is a challenge for low-resource languages for which both annotated and raw speech data is sparse. Maltese is one such language. Recent years have seen an increased interest in the computational processing of Maltese, including speech technologies, but resources for the latter remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for such languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the three data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.
翻译:发展语言技术是低资源语言的一项挑战,对低资源语言而言,附加说明和原始语言数据都很少。马耳他语就是这样一种语言。近年来,马耳他对包括语言技术在内的马耳他语言的计算处理越来越感兴趣,但后者的资源仍然稀少。在本文中,我们考虑数据增强技术,以提高这些语言的语音识别能力,重点是马耳他语,将其作为测试案例。我们考虑三种不同的数据增强类型:未经监督的培训、多语言培训以及将合成语言用作培训数据。目标是确定这些技术中的哪些技术,或者这些技术的组合,对于提高语言的语音识别效果最为有效,因为其起点是大约7小时的文字转换。我们的结果显示,将这里研究的三种数据增强技术结合起来,我们无需使用语言模式,就能实现15%的绝对WER改进。