Despite significant progress in text generation models, a serious limitation is their tendency to produce text that is factually inconsistent with information in the input. Recent work has studied whether textual entailment systems can be used to identify factual errors; however, these sentence-level entailment models are trained to solve a different problem than generation filtering and they do not localize which part of a generation is non-factual. In this paper, we propose a new formulation of entailment that decomposes it at the level of dependency arcs. Rather than focusing on aggregate decisions, we instead ask whether the semantic relationship manifested by individual dependency arcs in the generated output is supported by the input. Human judgments on this task are difficult to obtain; we therefore propose a method to automatically create data based on existing entailment or paraphrase corpora. Experiments show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods or those based on question generation, while additionally localizing the erroneous parts of the generation.
翻译:尽管在文本生成模型方面取得了显著进展,但一个严重的局限性是,它们倾向于产生与输入中的信息事实上不一致的文本。最近的工作研究了文字包含系统是否可以用于查明事实错误;然而,这些句级包含模型经过培训,解决与生成过滤器不同的问题,它们没有将一代人哪一部分内容非事实化。在本文件中,我们提议了一个新的要求,将它分解到依赖弧的水平。我们不是侧重于综合决定,而是问在生成的产出中单个依赖弧所表现的语义关系是否得到投入的支持。关于这项任务的人类判断是难以获得的;因此,我们建议了一种方法,根据现有的要求或副词体自动创建数据。实验表明,我们依此数据所培训的值要求模型可以确定对parphrasmation和总和比判决水平方法或基于问题生成的方法在事实上的不一致,同时将一代人错误的部分进一步定位。