Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in summarization. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.91 across synthetic tasks with known ground truth and can achieve a two-fold reduction in hallucinations on a real entity hallucination evaluation on the NYT dataset.
翻译:最近的工作发现,在自然语言生成(NLG)任务中,吵闹和附带错误的数据是产生幻觉和不真实产出的一个核心原因,因此,查明和删除这些例子是创造可靠的NLG系统方面一个关键的公开挑战。在这项工作中,我们引入了一个框架,以查明和消除导致不受欢迎的产出的低质量培训案例,如文字拼图中的忠诚错误。我们表明,现有的错误追踪方法,例如基于梯度的影响措施,在总和中无法可靠地发现不忠实的错误。我们通过新的、基于对比的估算方法克服了现有错误追踪方法的缺陷,将不理想的世代与人类校正的产出进行比较。我们提出的方法可以实现平均精确度为0.91的平均数,跨过已知地面真理的合成任务,并可以实现双倍减少对NYT数据集实体真实幻觉评估的幻觉。