The effective incorporation of cross-utterance information has the potential to improve language models (LMs) for automatic speech recognition (ASR). To extract more powerful and robust cross-utterance representations for the Transformer LM (TLM), this paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM. To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks. In addition to the LSTM module output, a shortcut connection using a fusion layer that bypasses the LSTM module is also investigated. The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets. The best R-TLM achieved 0.9%, 0.6%, and 0.8% absolute WER reductions over the single-utterance TLM baseline, and 0.5%, 0.3%, 0.2% absolute WER reductions over a strong cross-utterance TLM baseline on the AMI evaluation set, Eval2000 and RT03 respectively. Improvements on Eval2000 and RT03 were further supported by significance tests. R-TLMs were found to have better LM scores on words where recognition errors are more likely to occur. The R-TLM WER can be further reduced by interpolation with an LSTM-LM.
翻译:有效纳入跨通量信息有可能改进自动语音识别的语言模型(LMs),以便自动语音识别(ASR)。为了为变压器LM(TLM)提取更强大、更强大、更强大的交叉式演示,本文件建议在长期短期内存(LSTM)LM中使用隐藏状态的R-TLM(LSTM)LM(LM)LM(LM)L.M.)。为了将跨通量信息编码,R-TLM(LSTM)模块包含一个LSTM模块模块模块,同时在一些变压器区块中进行分级重现。除了LSTM模块输出外,还调查一个使用绕过LSTM模块的聚变层的快捷式连接。在AMI会议材料、Eval2000(Eval2000)和RT03电话谈话评价组中,对拟议的系统进行了评估。最佳R-TM(LM)实现了0.9%、0.6%和0.8%的绝对WER(WER)的削减,0.5,0.3%和0.2%的绝对WERM(LM)的削减了A-RRT(E)进一步评估、Eval2000和RM(LM)的改进了EV)的幅度。