Social media platforms have become new battlegrounds for anti-social elements, with misinformation being the weapon of choice. Fact-checking organizations try to debunk as many claims as possible while staying true to their journalistic processes but cannot cope with its rapid dissemination. We believe that the solution lies in partial automation of the fact-checking life cycle, saving human time for tasks which require high cognition. We propose a new workflow for efficiently detecting previously fact-checked claims that uses abstractive summarization to generate crisp queries. These queries can then be executed on a general-purpose retrieval system associated with a collection of previously fact-checked claims. We curate an abstractive text summarization dataset comprising noisy claims from Twitter and their gold summaries. It is shown that retrieval performance improves 2x by using popular out-of-the-box summarization models and 3x by fine-tuning them on the accompanying dataset compared to verbatim querying. Our approach achieves Recall@5 and MRR of 35% and 0.3, compared to baseline values of 10% and 0.1, respectively. Our dataset, code, and models are available publicly: https://github.com/varadhbhatnagar/FC-Claim-Det/
翻译:社会媒体平台已成为反社会元素的新战场,错误信息是选择的武器; 进行实况调查的组织试图在坚持其新闻程序的同时,尽量破除尽可能多的主张,同时要忠实于其新闻程序,但无法迅速传播。 我们认为,解决办法在于对核查生命周期进行部分自动化,为需要高度认知的任务节省人的时间。 我们提出一个新的工作流程,以便有效地发现以前经过事实核查的主张,利用抽象的总结来提出询问。然后,这些查询可以在与收集先前经过事实核实的索赔有关的一般用途检索系统上进行。 我们制作了一个抽象的文本汇总数据集,其中包括来自Twitter及其黄金摘要的吵闹声的主张。 我们的数据集、代码和模型是公开的: https://gin-sumpalmarization 模型, 3x, 其方法是对随附数据集进行与逐字查询的微调。 我们的方法达到35%和0.3的重新点和MRRR,而基准值分别为10 %和0.1。 我们的数据集、 代码和模型是公开的: https://gigh/combth/realfralFC。