With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on three generation datasets (two in the conversational QA domain and one in summarization) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.
翻译:随着各种应用的自然语言生成模式(NLG)的最近改进,必须具备手段,确定和评价NLG产出是否只是共享外部世界的可核查信息,在这项工作中,我们提出了一个新的评价框架,题为“可归属于已查明来源”,用于评估与外部世界有关的自然语言生成模式的输出;我们首先界定AIS,并引入一个两阶段说明管道,使通知员能够根据AIS准则适当评估模型产出。我们通过人类评估研究,实证了三种生成数据集(两个在对口QA域,一个在汇总域)的这一方法,它表明AIS可以作为衡量模型生成的报表是否得到基本来源支持的共同框架。我们发布了人类评估研究准则。