ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.
翻译:ASR错误探测模型(AED)旨在处理自动语音识别系统(ASR)的输出后,以发现抄录错误;现代方法通常使用纯由ASR抄录假设组成的基于文本的投入,而不考虑ASR模型的额外信号;相反,我们提议利用ASR系统单词级信任分数来改进AED的性能;具体地说,我们在AED模型的编码器中添加ASR信任嵌入层,使我们能够将信任分数和转录文本联合编码成背景化的表示法。我们的实验显示ASR对AED的信用分数的好处,及其对文本信号的补充效应,以及ACE合并这些信号的效力和强健性。为了促进进一步的研究,我们出版了由LibriSpeech文上ASR输出的AED数据集,并附有注释的抄录错误。