Nowadays large language models (LLMs) have shown revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. In this way, detecting machine-generated texts (MGTs) is becoming increasingly important as LLMs become more advanced and prevalent. These models can generate human-like language that can be difficult to distinguish from text written by a human, which raises concerns about authenticity, accountability, and potential bias. However, existing detection methods against MGTs are evaluated under different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework across different methodologies In this paper, we fill this gap by proposing the first benchmark framework for MGT detection, named MGTBench. Extensive evaluations on public datasets with curated answers generated by ChatGPT (the most representative and powerful LLMs thus far) show that most of the current detection methods perform less satisfactorily against MGTs. An exceptional case is ChatGPT Detector, which is trained with ChatGPT-generated texts and shows great performance in detecting MGTs. Nonetheless, we note that only a small fraction of adversarial-crafted perturbations on MGTs can evade the ChatGPT Detector, thus highlighting the need for more robust MGT detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of state-of-the-art MGT detection methods on their respective datasets and the development of more advanced MGT detection methods. Our source code and datasets are available at https://github.com/xinleihe/MGTBench.
翻译:如今,大型语言模型(LLMs)在各种自然语言处理(NLP)任务中显示出了革命性的优势,如文本分类、情感分析、语言翻译和问答。因此,检测机器生成文本(MGTs)在LLMs变得越来越先进和普及的情况下变得越来越重要。这些模型可以生成类似于人类语言的文本,难以与人类书写的文本区分开来,这引发了关于真实性、责任和潜在偏见的担忧。然而,现有的针对MGTs的检测方法在不同的模型架构、数据集和实验设置下进行评估,导致缺乏涵盖不同方法论的全面评估框架。在本文中,我们填补了这一空白,提出了第一个机器生成文本检测基准测试框架,名为MGTBench。在由ChatGPT(迄今为止最具代表性和最强大的LLMs)生成的答案精选的公共数据集上进行的广泛评估表明,大多数当前的检测方法对MGTs的性能表现较差。一个例外是ChatGPT检测器,它是使用ChatGPT生成文本进行训练的,并且在检测MGTs方面表现出极佳的性能。然而,我们注意到,只有少部分针对MGTs的敌对破坏可以逃避ChatGPT检测器,这凸显出需要更强大的MGT检测方法。我们预见到MGTBench将作为评估各自数据集上的最先进MGT检测方法和开发更先进的MGT检测方法的基准工具。我们的源代码和数据集可在https://github.com/xinleihe/MGTBench上获取。