Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings. Various language modeling techniques have been developed on post recognition tasks like semantic correction. In this paper, we propose a Transformer based semantic correction method with pretrained BART initialization, Experiments on 10000 hours Mandarin speech dataset show that character error rate (CER) can be effectively reduced by 21.7% relatively compared to our baseline ASR system. Expert evaluation demonstrates that actual improvement of our model surpasses what CER indicates.
翻译:虽然近年来自动语音识别(ASR)系统取得了显著改进,但口语识别错误发生,人类很容易发现。在语义校正等识别后任务方面,已经开发了各种语言模型技术。在本文件中,我们提出了基于变换器的语义纠正方法,预先培训了BART初始化,在10000小时的普通话语音数据集实验表明,与我们基线的ASR系统相比,字符错误率(CER)可有效降低21.7%。专家评估表明,我们模型的实际改进超过了CER所显示的。