We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.
翻译:本研究探讨如何判定文本是由人类撰写还是由大语言模型(LLM)生成。现有基于对数概率的最先进检测器利用给定源LLM分布函数评估观测文本对数概率的统计量。然而,仅依赖对数概率可能并非最优方案。为此,我们提出AdaDetectGPT——一种通过从训练数据中自适应学习见证函数以增强基于对数概率检测器性能的新型分类器。我们为其真阳性率、假阳性率、真阴性率和假阴性率提供了统计保证。大量数值研究表明,AdaDetectGPT在多种数据集与LLM的组合场景中几乎一致地改进了当前最优方法,改进幅度最高可达37%。本方法的Python实现发布于https://github.com/Mamba413/AdaDetectGPT。