In automatic speech recognition, many studies have shown performance improvements using language models (LMs). Recent studies have tried to use bidirectional LMs (biLMs) instead of conventional unidirectional LMs (uniLMs) for rescoring the $N$-best list decoded from the acoustic model. In spite of their theoretical benefits, the biLMs have not given notable improvements compared to the uniLMs in their experiments. This is because their biLMs do not consider the interaction between the two directions. In this paper, we propose a novel sentence scoring method considering the interaction between the past and the future words on the biLM. Our experimental results on the LibriSpeech corpus show that the biLM with the proposed sentence scoring outperforms the uniLM for the $N$-best list rescoring, consistently and significantly in all experimental conditions. The analysis of WERs by word position demonstrates that the biLM is more robust than the uniLM especially when a recognized sentence is short or a misrecognized word is at the beginning of the sentence.
翻译:在自动语音识别中,许多研究显示使用语言模型(LMs)的性能改进。最近的研究试图使用双向LM(biLMs)而不是传统的单向LMs(unILMs)来重新校准从声学模型中解码出来的最佳名单。尽管在理论上有好处,但与实验中的unILM相比,BILM没有显著的改进。这是因为它们的双向LM没有考虑到两个方向之间的互动。在本文中,我们建议采用新的评分方法,考虑过去和将来在双向M上词上的互动。我们在LibriSpeech文中的实验结果表明,在所有实验条件下,与$Best LM相比,拟议评分的双向LM(uniLM)超越了所有最佳名单。WERs的文字位置分析表明,双向LM比uniM(uniLM)更强大,特别是当承认的句短或错误的词句是在句开头。