The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction, which is further experimentally verified with detailed analysis. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer, which enables a theoretical justification for other ILM approaches. Systematic comparison is conducted for both in-domain and cross-domain evaluation on the Librispeech and TED-LIUM Release 2 corpora, respectively. Our proposed exact-ILM training can further improve the best ILM method.
翻译:RNN-Transer(RNN-Transer)的外部语言模型(LM)与隐性学习的内部LM(ILM)之间的不匹配会限制LM整合的性能,例如简单的浅质融合。一种巴伊西亚语的解释建议,在ILM校正之前取消这一顺序。在这项工作中,我们研究了在共同的 RNN-T 框架内开发的各种基于ILM 校正的LM 整合方法。我们提供了一种解码解释,说明与ILM 校正改进绩效的两个主要原因,该校正经过进一步实验,并经过详细分析。我们还提议了一个精确的LIM 培训框架,扩展了混合自动递制导器中提供的证据,从而为其他ILM 方法提供了理论上的理由。对Librispeech 和TED-LUM 版2 Corora分别进行了系统比较。我们提议的精确LIM 培训可以进一步改进最佳的 ILM方法。