Despite the recent progress in language generation models, their outputs may not always meet user expectations. In this work, we study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment. To this end, we consider factual consistency in summarization, the quality that the summary should only contain information supported by the input documents, for user preference alignment. We collect a high-quality dataset, DeFacto, containing human demonstrations and informational feedback in natural language consisting of corrective instructions, edited summaries, and explanations with respect to the factual consistency of the summary. Using our dataset, we study two natural language generation tasks: 1) editing a summary using the human feedback, and 2) generating human feedback from the original summary. Using the two tasks, we further evaluate if models can automatically correct factual inconsistencies in generated summaries. We show that the human-edited summaries we collected are more factually consistent, and pre-trained language models can leverage our dataset to improve the factual consistency of original system-generated summaries in our proposed generation tasks. We make the DeFacto dataset publicly available at https://github.com/microsoft/DeFacto.
翻译:尽管在语言生成模式方面最近取得了进展,但其产出不一定总能满足用户的期望。在这项工作中,我们研究自然语言的信息反馈是否可以被利用来提高生成质量和用户偏好的一致性。为此,我们考虑总结中的实际一致性,即摘要中只应包括输入文件所支持的信息的质量,以便用户偏好一致。我们收集了一个高质量的数据集DeFacto,其中载有关于人文演示和自然语言信息反馈,其中包括关于摘要事实一致性的纠正性指示、编辑摘要和解释。我们利用数据集,研究两种自然语言生成任务:1)利用人类反馈编辑一个摘要,2)从原始摘要中产生人类反馈。我们利用这两项任务进一步评估模型是否能够自动纠正生成摘要中的事实不一致性。我们发现,我们所收集的经人文编辑的摘要在事实上更加一致,经过预先培训的语言模型能够利用我们的数据集来改进我们拟议生成的原始系统摘要的实际一致性。我们在https://github.com/microusto/DeFacto上公布Defacto数据集。我们把Defacto数据集公布在https://github.com/mistrust/Defacto上。