The transcription quality of automatic speech recognition (ASR) systems degrades significantly when transcribing audios coming from unseen domains. We propose an unsupervised error correction method for unsupervised ASR domain adaption, aiming to recover transcription errors caused by domain mismatch. Unlike existing correction methods that rely on transcribed audios for training, our approach requires only unlabeled data of the target domains in which a pseudo-labeling technique is applied to generate correction training samples. To reduce over-fitting to the pseudo data, we also propose an encoder-decoder correction model that can take into account additional information such as dialogue context and acoustic features. Experiment results show that our method obtains a significant word error rate (WER) reduction over non-adapted ASR systems. The correction model can also be applied on top of other adaptation approaches to bring an additional improvement of 10% relatively.
翻译:自动语音识别系统(ASR)的笔录质量在翻译来自隐蔽域的音频时会显著下降。 我们建议为不受监督的 ASR 域适应提供一种不受监督的错误更正方法,旨在收回域错失造成的转录错误。 与现有依靠转录音频进行培训的更正方法不同,我们的方法只要求使用伪标签技术生成校正培训样本的目标域的无标签数据。 为了减少对伪数据过于适合的情况,我们还提议了一个编码器-解码器校正模型,该模型可以考虑到诸如对话背景和声学特征等额外信息。 实验结果显示,我们的方法在非经调整的ASR系统上获得了显著的字差率(WER)的减少。 校正模型还可以在其他适应方法之外应用,使10%的相对改进达到10%。