Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data and collected data for 15 languages, and trained experimental models using these techniques. Our results show that pooling the small amounts of data available in multilingual end-to-end models, and pre-training on unsupervised data can help improve speech recognition quality for many African languages.
翻译:在非洲使用的2 000种以上语言中,几乎没有任何一种语言有广泛可用的自动语音识别系统,所需数据也只提供给少数语言。我们试验了两种技术,这些技术可以为非洲语言提供大型词汇语音识别途径:多语种建模和自我监督学习。我们收集了开放源数据,收集了15种语言的数据,并用这些技术培训了实验模型。我们的结果显示,汇集了多语种端对端模型中现有的少量数据,以及对未经监督的数据进行预先培训,可以帮助改善许多非洲语言的语音识别质量。