Language is inherent and compulsory for human communication. Whether expressed in a written or spoken way, it ensures understanding between people of the same and different regions. With the growing awareness and effort to include more low-resourced languages in NLP research, African languages have recently been a major subject of research in machine translation, and other text-based areas of NLP. However, there is still very little comparable research in speech recognition for African languages. Interestingly, some of the unique properties of African languages affecting NLP, like their diacritical and tonal complexities, have a major root in their speech, suggesting that careful speech interpretation could provide more intuition on how to deal with the linguistic complexities of African languages for text-based NLP. OkwuGb\'e is a step towards building speech recognition systems for African low-resourced languages. Using Fon and Igbo as our case study, we conduct a comprehensive linguistic analysis of each language and describe the creation of end-to-end, deep neural network-based speech recognition models for both languages. We present a state-of-art ASR model for Fon, as well as benchmark ASR model results for Igbo. Our linguistic analyses (for Fon and Igbo) provide valuable insights and guidance into the creation of speech recognition models for other African low-resourced languages, as well as guide future NLP research for Fon and Igbo. The Fon and Igbo models source code have been made publicly available.
翻译:无论是以书面还是口头方式表达,对于人类交流来说,语言是固有和强制性的。无论是以书面还是口头方式表达,它都能够确保同一区域和不同区域的人之间的理解。随着人们日益认识到并努力将更低资源的语言纳入国家语言方案的研究,非洲语言最近成为机器翻译和其他基于文本的国家语言方案领域研究的一个主要课题。然而,对于非洲语言的语音识别,我们仍然很少进行类似的研究。有趣的是,影响非洲语言语言语言语言的有些独特特性,像其截然和通俗复杂性一样,在他们的演讲中有着重要根基,表明谨慎的语音解释可以提供更直觉的见解,说明如何处理非洲语言在基于文本的国家语言方案研究中使用的更低资源语言的复杂性。OkwuGb\'e是建立非洲低资源语言语音识别系统的一个步骤。然而,我们利用Fon和Igbo的案例研究,对每种语言进行全面的语言分析,描述为两种语言创建端端端、深线网络的语音识别模式。我们为Fon提供一种最先进的ASR模型,并且将ASR作为Ig的低语言识别模型作为Ig的参考模型和Fbo的参考。