With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.
翻译:随着各种应用的自然语言生成模式(NLG)的最近改进,必须具备手段,确定和评价NLG产出是否只是共享外部世界的可核查信息;在这项工作中,我们通过人类评估研究,提出了一个新的评价框架,题为“可归属于已查明来源(AIS)”,用于评估自然语言生成模式的产出,如果这种产出与外部世界有关,我们首先界定AIS,并引入一个两阶段性说明管道,使通知员能够根据AIS准则适当评估模型产出。我们通过人类评估研究,实证了这一关于生成数据集的方法,该方法涉及三项任务(两个对口的QA数据集、一个汇总数据集和一个表格到文本数据集),表明AIS可以作为衡量模型生成的报表是否得到基本来源支持的共同框架。我们发布了人类评估研究准则。