The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models. This research direction is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry.
翻译:生成文本的目的是使机器以人类语言表达,这是自然语言处理中最重要的但最具挑战性的任务之一。自2014年以来,Seq2Seq2Seq率先推出的各种神经编码解码模型已经提出,通过学习将输入文本绘制成输出文本来实现这一目标,但是,仅输入文本往往提供有限的知识来产生预期产出,因此在许多现实世界情景中,生成文本的绩效仍然远远不能令人满意。为解决这一问题,研究人员考虑将各种知识形式纳入生成模型,这种研究方向被称为知识强化文本生成。在本调查中,我们介绍了过去五年对知识强化文本生成的研究的全面审查,主要内容包括两个部分:(一) 将知识纳入生成文本的一般方法和架构;(二) 不同形式知识数据的具体技术和应用。这一调查可以让学术界和工业界的广大受众、研究人员和从业人员参加。