The emergence of large language models (LLMs) has resulted in the production of LLM-generated texts that is highly sophisticated and almost indistinguishable from texts written by humans. However, this has also sparked concerns about the potential misuse of such texts, such as spreading misinformation and causing disruptions in the education system. Although many detection approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. This survey aims to provide an overview of existing LLM-generated text detection techniques and enhance the control and regulation of language generation models. Furthermore, we emphasize crucial considerations for future research, including the development of comprehensive evaluation metrics and the threat posed by open-source LLMs, to drive progress in the area of LLM-generated text detection.
翻译:随着大规模语言模型(LLMs)的出现,LLM生成文本的产生已经变得高度复杂,几乎难以区分是否是人类编写的文本。但是,这也引发了对潜在误用这些文本的担忧,例如传播虚假信息以及在教育体系中引起混乱。虽然已经提出了许多检测方法,但对其成就和挑战的全面了解仍然缺乏。本次调查旨在提供现有LLM生成文本检测技术概述,加强对语言生成模型的控制和规范。此外,我们强调了未来研究的重要考虑因素,包括开发全面的评估指标以及开源LLMs带来的威胁,以推动LLM生成文本检测领域的进步。