The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.
翻译:大型语言模型(LLMS)的流利和事实知识使得人们更需要相应的系统来检测某一文本是否是机器写成的。例如,学生可能使用LLMS来完成书面任务,使教官无法准确评估学生的学习情况。在本文中,我们首先表明,从LLLM抽取的文本往往占据该模型日志概率函数的负曲线区域。利用这一观察,我们然后确定一个新的基于曲线的标准,用以判断某一LLM是否产生一个通道。我们称之为SetGPT的这个方法不需要培训一个单独的分类器,收集真实或生成的通道的数据集,或明确标注生成的文本。在另一个通用的预先培训语言模型(e.g,T5)。 我们发现SetGPT比现有的样本检测零光方法更具歧视性,特别是改进对由20B参数GPT-NeoX生成的假新闻文章的检测,而GPT是0.81 AURPOC的, 也就是最强烈的零光/GMTUM 基线项目。