Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
翻译:最近的研究显示,使用外部语言模型(LM)有利于端到端自动语音识别(ASR),然而,在培训组中表现较少的预测符号仍然相当具有挑战性。长尾预测问题在许多应用中得到了广泛研究,但只通过对ASR和LMS的几项研究得到了解决。在本文件中,我们为LM提出了一个新的内存增强外观字典结构。新引入的外观字典将丰富的背景信息纳入培训组,这对正确预测长尾标记至关重要。在中英数据集的密集实验中,我们提出的方法证明在单词/字符错误率和尾代号错误率上都大大优于LM,这不影响解码效率。总体而言,我们展示了我们提出的提高ASR解码性能的方法的有效性,特别是长尾标记。