Building a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced languages and yet this is paramount in societies where radio is the main medium of public communication and discussions. Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are excluded from social media is important in national planning. However, these efforts are being challenged by the absence of transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT toolkit, an open source speech recognition toolkit.
翻译:建立可用的无线电监测自动语音识别系统(ASR)对于资源不足的语言来说是一项艰巨的任务,然而,在无线电是公共交流和讨论的主要媒介的社会中,这是极其重要的;联合国在乌干达的初步努力证明,了解被排除在社会媒体之外的农村人口的看法在国家规划中的重要性;然而,这些努力因没有转录的语音数据集而面临挑战;在本文件中,Makerere 人工智能研究实验室发布了一个Luganda无线电语音成套材料,长达155小时;据我们所知,这是撒哈拉以南非洲第一个公开提供的无线电数据集;该文件介绍了语音资料的开发情况,并介绍了使用开放源的语音识别工具包CoquiTT工具包的基线Luganda ASR工作成绩。