ChatGPT has become a global sensation. As ChatGPT and other Large Language Models (LLMs) emerge, concerns of misusing them in various ways increase, such as disseminating fake news, plagiarism, manipulating public opinion, cheating, and fraud. Hence, distinguishing AI-generated from human-generated becomes increasingly essential. Researchers have proposed various detection methodologies, ranging from basic binary classifiers to more complex deep-learning models. Some detection techniques rely on statistical characteristics or syntactic patterns, while others incorporate semantic or contextual information to improve accuracy. The primary objective of this study is to provide a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. Additionally, we evaluated other AI-generated text detection tools that do not specifically claim to detect ChatGPT-generated content to assess their performance in detecting ChatGPT-generated content. For our evaluation, we have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains and user-generated responses from popular social networking platforms. The dataset serves as a reference to assess the performance of various techniques in detecting ChatGPT-generated content. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
翻译:ChatGPT已经成为全球热点。随着ChatGPT和其他LLM出现,各种滥用它们的担忧也随之增加,例如传播假新闻、抄袭、操纵舆论、欺骗和欺诈。因此,区分人工生成和AI生成变得越来越重要。研究人员已经提出了各种检测方法,从基本的二元分类器到更复杂的深度学习模型。有些检测技术依赖于统计特性或句法模式,而其他一些方法则纳入语义或上下文信息以提高准确性。本研究的主要目标是对最新的ChatGPT检测技术进行全面和现代的评估。此外,我们还评估了其他不具体声称检测ChatGPT生成内容的AI生成文本检测工具,以评估它们在检测ChatGPT生成内容方面的性能。为了进行评估,我们编制了一个基准数据集,其中包括来自医疗、开放性问答和金融领域的各种问题和受众生成的在一些流行社交网络平台上的回答。该数据集用作评估各种技术在检测ChatGPT生成内容方面的性能的参考。我们的评估结果表明,目前不存在任何一种方法可以有效地检测ChatGPT生成内容。