Designing a natural voice interface rely mostly on Speech recognition for interaction between human and their modern digital life equipment. In addition, speech recognition narrows the gap between monolingual individuals to better exchange communication. However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers. This condition is reflected in both the form and content of the dialect, so this paper gives an overview of the Sudanese dialect and the tasks of collecting represented resources and pre-processing performed to construct a modest dataset to overcome the lack of annotated data. Also proposed end- to-end speech recognition model, the design of the model was formed using Convolution Neural Networks. The Sudanese dialect dataset would be a stepping stone to enable future Natural Language Processing research targeting the dialect. The designed model provided some insights into the current recognition task and reached an average Label Error Rate of 73.67%.
翻译:此外,语音识别缩小了单语个人之间的隔阂,以更好地交流交流;然而,实地缺乏对几种通用语言及其方言的广泛支持,而大多数日常对话都是使用这些方言进行的。本文旨在检查设计苏丹方言自动语音识别模式的可行性,苏丹方言是阿拉伯语方言之一,其复杂性是其演讲者独特的历史和社会条件的产物。这一条件体现在方言的形式和内容上,因此本文概述了苏丹方言和代表方言的收集任务,以及为解决缺少附加说明数据问题而构建的少量数据集的预处理任务。还拟议了终端至终端语音识别模式,该模式的设计是使用Convolution Neural网络形成的。苏丹方言数据集将是未来针对方言进行自然语言处理研究的垫脚石。设计模型为当前识别任务提供了一些洞察力,并达到了73.67%的拉贝尔平均错误率。