Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store certain knowledge/facts from training corpus, their knowledge awareness is still far from satisfactory. To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed. In this paper, we provide a comprehensive survey of the literature on this emerging and fast-growing field - Knowledge Enhanced Pretrained Language Models (KE-PLMs). We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. Finally, we discuss challenges that face KE-PLMs and also promising directions for future research.
翻译:预先培训的语言模式(PLM)通过学习关于大规模文字材料库的信息化背景介绍,建立了一种新的模式。这一新模式使整个自然语言处理领域发生了革命性的变化,并为各种NLP任务确定了新的最新业绩;然而,尽管PLM可以储存培训教材中的某些知识/事实,但其知识意识仍然远远不能令人满意。为解决这一问题,将知识纳入PLM最近已成为一个非常积极的研究领域,并制定了各种办法。在本文件中,我们全面调查了这个新兴和快速增长的领域——知识强化预先培训的语言模式(KE-PLM)的文献。我们引入了三个分类表来对现有工作进行分类。此外,我们还调查了KE-PLM显示优于vanilla PLMs的各种NLG应用程序。最后,我们讨论了KE-PLM所面临的挑战,以及未来研究的有希望的方向。