Given the recent proliferation of false claims online, there has been a lot of manual fact-checking effort. As this is very time-consuming, human fact-checkers can benefit from tools that can support them and make them more efficient. Here, we focus on building a system that could provide such support. Given an input document, it aims to detect all sentences that contain a claim that can be verified by some previously fact-checked claims (from a given database). The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible, together with corresponding evidence. Unlike previous work, which has looked into claim retrieval, here we take a document-level perspective. We create a new manually annotated dataset for this task, and we propose suitable evaluation measures. We further experiment with a learning-to-rank approach, achieving sizable performance gains over several strong baselines. Our analysis demonstrates the importance of modeling text similarity and stance, while also taking into account the veracity of the retrieved previously fact-checked claims. We believe that this research would be of interest to fact-checkers, journalists, media, and regulatory authorities.
翻译:鉴于最近网上虚假索赔大量涌现,已经做了大量手工核实事实的工作。由于这是非常费时的,因此,人类实况调查员可以受益于能够支持他们的工具,并提高其效率。在这里,我们侧重于建立一个能够提供这种支持的系统。根据一份投入文件,它旨在检测所有包含一项索赔的句子,这些索赔可以由某些先前经过事实核实的索赔(从一个特定数据库中)加以核实。产出是重新排列的文件句子清单,以便那些可以核实的文件句子尽可能被排在高位,并有相应的证据。与以前的工作不同,我们研究了索赔的检索,我们在这里采取了文件层面的观点。我们为这项任务创建了一个新的人工附加说明的数据集,我们提出了适当的评估措施。我们进一步尝试了学习到排位的方法,在几个强的基线上取得了可观的绩效收益。我们的分析表明,建模文本相似和立场的重要性,同时也考虑到先前经过核实的主张的真实性。我们认为,这一研究对事实检查者、记者、媒体和管理当局都有兴趣。