Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.
翻译:近些年来,人们对马耳他语言的计算处理越来越感兴趣,但资源仍然稀少。在本文件中,我们考虑数据增强技术,以改善低资源语言的语音识别,重点是马耳他语作为测试案例。我们考虑三种不同类型的数据增强:未经监督的培训、多语种培训和将综合语言作为培训数据使用。目的是确定这些技术中的哪一种或这些技术的组合,对于提高语言语言的语音识别最为有效,因为其起点是大约7小时的简写演讲的少量内容。我们的结果显示,将这里研究的数据增强技术结合起来,我们无需使用语言模式,就能实现15%的绝对WER改进。