A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual entailment between the input data and the output text in both directions, allowing us to reveal omissions or hallucinations. Input data are converted to text for NLI using trivial templates. Our experiments on two recent D2T datasets show that our metric can achieve high accuracy in identifying erroneous system outputs.
翻译:评估数据到文字(D2T)生成的主要挑战是测量生成文本的语义准确性,即检查输出文本是否包含所有且只有输入数据所支持的事实。我们提出了一个新的衡量标准,用于评估D2T生成的语义准确性,其依据是自然语言推断学预先培训的神经模型(NLI),我们使用NLI模型来检查输入数据与双向输出文本之间的文字要求,允许我们揭示遗漏或幻觉。输入数据被转换成国家语言研究所的文本,使用微小模板。我们对最近两个D2T数据集的实验表明,我们的测量标准在识别错误的系统输出方面可以达到很高的准确性。