The performance of data-driven natural language processing systems is contingent upon the quality of corpora. However, principal corpus design criteria are often not identified and examined adequately, particularly in the speech processing discipline. Speech corpora development requires additional attention with regard to clean/noisy, read/spontaneous, multi-talker speech, accents/dialects, etc. Domain selection is also a crucial decision point in speech corpus development. In this study, we demonstrate the significance of domain selection by assessing a state-of-the-art Bangla automatic speech recognition (ASR) model on a novel multi-domain Bangladeshi Bangla ASR evaluation benchmark - BanSpeech, which contains 7.2 hours of speech and 9802 utterances from 19 distinct domains. The ASR model has been trained with deep convolutional neural network (CNN), layer normalization technique, and Connectionist Temporal Classification (CTC) loss criterion on SUBAK.KO, a mostly read speech corpus for the low-resource and morphologically rich language Bangla. Experimental evaluation reveals the ASR model on SUBAK.KO faces difficulty recognizing speech from domains with mostly spontaneous speech and has a high number of out-of-vocabulary (OOV) words. The same ASR model, on the other hand, performs better in read speech domains and contains fewer OOV words. In addition, we report the outcomes of our experiments with layer normalization, input feature extraction, number of convolutional layers, etc., and set a baseline on SUBAK.KO. The BanSpeech will be publicly available to meet the need for a challenging evaluation benchmark for Bangla ASR.
翻译:以数据驱动的自然语言处理系统的性能取决于语体的质量,然而,主文设计标准往往没有被确定和充分审查,特别是在语言处理学科中,主要文体设计标准往往没有被充分确定和审查。语言系统开发需要更多关注清洁/噪音、读/写、多讲、多讲、口音、口音等。 域选择也是语音系统开发中的一个关键决定点。在本研究中,我们通过对孟加拉国孟加拉邦格拉自动语音识别(ASR)新颖的多多面体评估标准――BanSpeech,包含7个小时的语音和来自19个不同领域的9802个语音。 语言系统模型经过深层演进神经网络(CNN)、层正常化技术以及连接温度分类(CTC)损失标准的培训,显示了域选择域的重要性。 在低资源和形态上丰富的语言Bangla, 实验性评估揭示了孟加拉国孟加拉孟加拉孟加拉邦格拉ABanglaA模型的多面值评价标准, 含有7小时的演讲和9个高面的语音分析, 以及自发式和手表的语音分析中,在A-OLA-OBLA-ROLLLLA和高语言上, 需要读、自译、高语言的语音、高音、高音、高音、高音、高音、高音、高音、手的演、高音、手的语音、手的语音、手的演、手的动作、手的动作、手的动作、手的动作、手的动作、手的动作、手的动作、手表、手表、手表、手表、手表、手表式、手表、手表、手表、手表、手表、手表、手表、手的音、手表、手、手表、手表、手表、手表、手表、手的音、手表、手的音、手的音、手表、手的音、手的、手的、手的音、手的音、手的音、手的音、手的音、手的音、手的音、手的、手的音、手的、手的、手的音、手的音、手的音、手的音、手的音、手的音、手的音、手的音、手</s>