Post-editing in Automatic Speech Recognition (ASR) entails automatically correcting common and systematic errors produced by the ASR system. The outputs of an ASR system are largely prone to phonetic and spelling errors. In this paper, we propose to use a powerful pre-trained sequence-to-sequence model, BART, further adaptively trained to serve as a denoising model, to correct errors of such types. The adaptive training is performed on an augmented dataset obtained by synthetically inducing errors as well as by incorporating actual errors from an existing ASR system. We also propose a simple approach to rescore the outputs using word level alignments. Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors and produces improved WER results when compared against a competitive baseline. We also highlight a negative result obtained on the related grammatical error correction task in Hindi language showing the limitation in capturing wider context by our proposed model.
翻译:自动语音识别(ASR)后编辑要求自动纠正ASR系统产生的常见和系统错误,ASR系统的产出大都容易出现语音和拼写错误,在本文中,我们提议使用一个强大的预先训练后序列到序列模型,BART, 进一步经过适应性培训,可以作为拆卸模式,纠正这类类型的错误。适应性培训是针对通过合成诱导错误获得的强化数据集进行的,以及纳入现有ASR系统的实际错误进行的。我们还提出了一个简单的方法,用字级校正产出。重音语音数据的实验结果表明,我们的战略有效地纠正了大量ASR错误,并与竞争基线相比,产生了更好的WER结果。我们还强调了在印地语中相关的语法错误校正任务上取得的负面结果,表明我们拟议模式在较广范围内捕捉到的局限性。