与人类专家之间如何接近? 比较公司、评估和探测 (How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection)

The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by ChatGPT or humans. We build three different detection systems, explore several key factors that influence their effectiveness, and evaluate them in different scenarios. The dataset, code, and models are all publicly available at https://github.com/Hello-SimpleAI/chatgpt-comparison-detection.

翻译：查特格普特的引入在学术界和工业界引起了广泛的关注。查特格普特能够有效地应对广泛的人类问题,提供流畅和全面的答案,大大超过以往公开聊天室的安全和有用性。一方面,人们对查特格普特如何取得这种实力以及这种力量与人类专家的距离感到好奇。另一方面,人们开始担心查特格普特等大型语言模式(LLLMs)可能对社会产生的潜在负面影响,如假新闻、污蔑和社会保障问题。在这项工作中,我们从人类专家和查特格普特收集了成千上万份比较答复,这些答复在安全和实用性方面大大超过以前的公众聊天室;一方面,人们对查特格普特能如何取得这种力量以及它离人类专家的距离感到好奇;另一方面,根据HC3数据集,我们开始研究查特格特普特特普特的对策的特点、人类专家的差别和差距,以及LMs的未来方向。我们从人类专家那里对查获的查取内容进行了全面的人类全面评估和语言分析,而后,我们又从三种主要的测算结果,我们又从三个人类测取了多少,我们如何通过不同的测查取了这些。

相关内容

ChatGPT

关注 257

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日