$N$-gram language models (LM) have been largely superseded by neural LMs as the latter exhibits better performance. However, we find that $n$-gram models can achieve satisfactory performance on a large proportion of testing cases, indicating they have already captured abundant knowledge of the language with relatively low computational cost. With this observation, we propose to learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution. The combination of $n$-gram and neural LMs not only allows the neural part to focus on the deeper understanding of language but also provides a flexible way to customize an LM by switching the underlying $n$-gram model without changing the neural model. Experimental results on three typical language tasks (i.e., language modeling, machine translation, and summarization) demonstrate that our approach attains additional performance gains over popular standalone neural models consistently. We also show that our approach allows for effective domain adaptation by simply switching to a domain-specific $n$-gram model, without any extra training. Our code is released at https://github.com/ghrua/NgramRes.
翻译:美元-克语言模型(LM)大部分被神经LM(Neal LM)取代,因为后者表现较好。然而,我们发现,一美元-克模型在大部分测试案例中能够取得令人满意的表现,表明它们已经掌握了对语言的丰富知识,计算成本相对较低。我们建议学习一个神经LM,该模型与一美元-克LM和真实数据分布之间的剩余部分相匹配。美元-克和神经LM的结合不仅使神经部分能够侧重于对语言的更深入理解,而且还提供了一种灵活的方式,通过在不改变神经模型的情况下转换基本美元-克模型,使LM具有自定义性。关于三种典型语言任务(即语言模型、机器翻译和总称)的实验结果表明,我们的方法在始终保持大众独立模型的基础上取得了额外的绩效增益。我们还表明,我们的方法允许有效的域适应,只要简单地转换为特定域的美元-克模型,而不进行任何额外培训。我们的代码在 http://githbub./rua/sromas发布。