We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures. The Uzbek speech corpus (USC) comprises 958 different speakers with a total of 105 hours of transcribed audio recordings. To the best of our knowledge, this is the first open-source Uzbek speech corpus dedicated to the ASR task. To ensure high quality, the USC has been manually checked by native speakers. We first describe the design and development procedures of the USC, and then explain the conducted ASR experiments in detail. The experimental results demonstrate promising results for the applicability of the USC for ASR. Specifically, 18.1% and 17.4% word error rates were achieved on the validation and test sets, respectively. To enable experiment reproducibility, we share the USC dataset, pre-trained models, and training recipes in our GitHub repository.
翻译:我们利用深神经网络隐藏的Markov模型(DNN-HMM)和端对端结构(E2E),为乌兹别克语提供了可自由获取的语音资料,并报告了初步自动语音识别(ASR)结果。乌兹别克语音资料(USC)由958位不同的发言者组成,总共进行了105小时的录音录音转录。据我们所知,这是第一个专门从事ASR任务的开放源乌兹别克语音资料。为了确保高质量的,USC已经由当地演讲者手工检查。我们首先描述了USC的设计和开发程序,然后详细解释了已经进行的ASR实验。实验结果显示了USC适用于ASR的有希望的结果。具体地说,在验证和测试组中分别实现了18.1%和17.4%的字差错率。为了能够进行实验,我们分享了我们的GitHub储存库中的USC数据集、预先培训模式和培训食谱。