Building automatic speech recognition (ASR) systems is a challenging task, especially for under-resourced languages that need to construct corpora nearly from scratch and lack sufficient training data. It has emerged that several African indigenous languages, including Kiswahili, are technologically under-resourced. ASR systems are crucial, particularly for the hearing-impaired persons who can benefit from having transcripts in their native languages. However, the absence of transcribed speech datasets has complicated efforts to develop ASR models for these indigenous languages. This paper explores the transcription process and the development of a Kiswahili speech corpus, which includes both read-out texts and spontaneous speech data from native Kiswahili speakers. The study also discusses the vowels and consonants in Kiswahili and provides an updated Kiswahili phoneme dictionary for the ASR model that was created using the CMU Sphinx speech recognition toolbox, an open-source speech recognition toolkit. The ASR model was trained using an extended phonetic set that yielded a WER and SER of 18.87% and 49.5%, respectively, an improved performance than previous similar research for under-resourced languages.
翻译:建立自动语音识别(ASR)系统是一项艰巨的任务,特别是对于几乎从零开始、缺乏足够培训数据的资源不足的语言来说,建立自动语音识别系统是一项艰巨的任务,特别是对于几乎需要从零开始建立集体体、缺乏足够培训数据的缺乏资源的语言而言,已经出现若干非洲土著语言,包括斯瓦希里语,在技术上资源不足;ASR系统至关重要,特别是对听力障碍者而言,他们可以从用其母语提供记录誊本中受益;然而,没有转录语音数据集,使得为这些土著语言开发ASR模型的努力复杂化;本文件探讨了记录进程和斯瓦希里语的斯瓦希里语发展一套斯瓦希里语,其中包括读出文本和自发语音数据;研究还探讨了斯瓦希里语的语词典和自发语音数据,并为ASR模型提供了最新的斯瓦希里语拼音典词典,该模型是使用CMU Sphinx语音识别工具箱、开放源语音识别工具包创建的。