In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research.
翻译:近年来,为产生模仿人类语言流畅和一致性的文本而设计的系统的能力有了大幅增长,从此,进行了大量研究,旨在研究这些天然语言生成器(NLG)对大量任务的潜在用途。强大的文本生成器对人文书写作的日益增强的能力令人信服地提高了欺骗和其他形式的危险滥用的可能性。随着这些系统改进,而且更难区分人文文本和机器生成的文本,恶意行为者可以将这些强大的NLG系统用于各种各样的目的,包括制作假新闻和错误信息,制作虚假的在线产品审查,或者通过聊天机作为说服用户披露私人信息的手段。在本文件中,我们通过识别和审查119份以NLG研究为重点的类似调查文件来概述NLG领域。我们从这些文件中概述了构成NLG的中央概念的高层次分类,包括我们用来开发通用NLG系统的方法,这些系统经常生成更多的在线产品审查,或者通过聊天器生成的在线文件,从而向我们提供当前对NLG工作的潜在评估。