This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.
翻译:本文建议采用最先进的经常性神经网络语言模式,将不仅从最后的 RNN 层而且从中间层计算出的概率分布结合起来。我们提议的方法提高了基于杨等人(2018年)提出的语言模型矩阵因素解释语言模型的表达力。拟议方法改进了目前最先进的语言模式,并取得了在Penn Treebank和WikitText-2这两个标准基准数据集中的最佳分数。此外,我们指出,我们提议的方法有助于两项应用任务:机器翻译和头版生成。我们的代码在https://github.com/nttcslab-nlp/doc_lm上公布。