In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 hours. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.
翻译:在这项工作中,我们探讨了缅甸语基于端到端自动语音识别(ASR)的连接时间分类(CTC)模式,一系列实验是在变迁层增减模型的地形学上展示的,使用了不同深度的双向长短期内存(BLSTM)层,并调查了不同的标签编码方法,实验在低资源情景中进行,使用我们记录的缅甸语近26小时的语音材料,最佳模型在测试集上达到4.72%的性格误差率和12.38%的可调差率。