Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.
翻译:由于最近自然语言处理的进展,一些工程应用了BERT预先培训的隐蔽语言模式(MLM)来纠正语音识别后,然而,现有的预培训模式只考虑语义更正,而语言的语义特征则被忽略。语义单一校正后校正将因此降低性能,因为同声错误在中国的ASR中相当常见。在本文中,我们提出了一种新颖的办法,以集体利用背景化的表述方式以及错误与其替换候选人之间的语音信息,以降低中文语音识别的错误率。我们在真实世界语音识别数据集的实验结果表明,我们提议的CER明显低于基线模型,而基线模型使用预先培训的BERT MLM作为校正者。