查特格PT是否是一个很好的NLG评估者?</s> (Is ChatGPT a Good NLG Evaluator? A Preliminary Study)

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of NLG models is an arduous task and previous statistical metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to score the generation of NLG models. We conduct experiments on three widely-used NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with golden human judgments. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

翻译：最近,恰特格普特的出现吸引了计算语言界的广泛关注,许多先前的研究显示,查特格普特在自动评价指标方面在各种国家劳工政策任务中取得了显著成绩,然而,查特格特特特作为评价指标的能力仍未得到充分探讨。考虑到评估国家劳工政策模型的质量是一项艰巨的任务,而以往的统计指标臭名昭著地表明,它与人类判断之间的关系不佳,我们想知道查特特特特是否是国家劳工政策评价的好指标。我们在本报告中提供了关于查特普特的初步元评价,以显示其作为国家劳工政策指标的可靠性。我们详细地将查特特特特特特作为人类评价员,并给予具体任务(例如,总结)和具体方面(例如,相关性)指导,以促使查特特特特特特特制作国家劳工政策模型。我们试验了三个广泛使用的国家劳工政策委员会元评价数据集(包括总结、故事生成和数据对文本的任务),我们实验的结果显示,与以前的自动计量标准相比,查特格格特特特特特特特特特特特特特特特特(ChattG)与我们的国家初步的、创新和Gregreal-Gregnial-G/Gslent/Gdgregent/Gdgentalgent/WWealstgentgentgregalstggggs可以取得初步的、Birst/Birst/Gsg)之间的初步研究。</s>

相关内容

ChatGPT

关注 257

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日