ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper, we propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA). To realize such an ILME-ADA, an interpolated log-likelihood score is calculated based on the maximum of the scores from the internal LM and the external LM (ELM) respectively. We demonstrate the efficacy of the proposed ILME-ADA method with both RNN-T and LAS modeling frameworks employing neural network and n-gram LMs as ELMs respectively on two domain specific (target) test sets. The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods.
翻译:ASR 模式部署环境不断变化,而即将到来的演讲可在会议期间在不同领域互换。当只有目标域文本数据,而我们的目标是在目标域内取得明显改进的性能,而一般域的性能较少时,我们的目标是在目标域内取得明显改进的性能。在本文件中,我们建议采用适应性LM聚合法,称为基于内部语言模型的适应性域适应性估计(ILME-ADAD)。为了实现这种ILME-ADA,根据内部LM和外部LM(ELM)的最大分数分别计算出一个跨集成的日志相似性分数。我们用RNN-T和LAS模型框架展示了拟议的ILME-ADA方法的功效,在两个具体域(目标)测试组中分别使用神经网络和n-gram LMs作为ELMs作为ELMs。提议的方法可以在目标测试组上取得显著更好的性能,同时在一般测试组内,与浅体和以LME(LM)的LM的LM联合法方法相比,在最低性能降解。