Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of the most important facts from the ruling comments (RCs) of a professional journalist to obtain fact-checking explanations. However, these explanations lack fluency and sentence coherence. In this work, we present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of disconnected RCs. To regulate our editing algorithm, we use a scoring function with components including fluency and semantic preservation. In addition, we show the applicability of our approach in a completely unsupervised setting. We experiment with two benchmark datasets, LIAR-PLUS and PubHealth. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.
翻译:事实检查系统已成为核实虚假和误导新闻的重要工具。 当人类可读的解释随真实标签而出现时,这些系统变得更加可靠。 然而,人工收集这种解释成本昂贵且耗时。 最近的工作框架解释生成为抽取性总结,并提议从专业记者的裁决评论中自动选择足够数量的最重要事实子集,以获得事实检查解释。然而,这些解释缺乏流畅性和句子的一致性。在这项工作中,我们提出了一个基于编辑的迭代算法,它只使用语句级别的编辑来对互不连接的RC进行不受监督的编辑后编辑。为规范我们的编辑算法,我们使用带有流利和语义保存等组成部分的评分功能。此外,我们还展示了我们的方法在完全不受监督的环境中的适用性。我们实验了两个基准数据集,即LIAR-PLUS和PubHealth。 我们展示了我们的模型生成的解释是流利、可读性、非冗余和覆盖重要事实检查信息。