Fact-checking has gained increasing attention due to the widespread of falsified information. Most fact-checking approaches focus on claims made in English only due to the data scarcity issue in other languages. The lack of fact-checking datasets in low-resource languages calls for an effective cross-lingual transfer technique for fact-checking. Additionally, trustworthy information in different languages can be complementary and helpful in verifying facts. To this end, we present the first fact-checking framework augmented with cross-lingual retrieval that aggregates evidence retrieved from multiple languages through a cross-lingual retriever. Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in which the model learns to identify the passage corresponding to a given translated title. On the X-Fact dataset, our approach achieves 2.23% absolute F1 improvement in the zero-shot cross-lingual setup over prior systems. The source code and data are publicly available at https://github.com/khuangaf/CONCRETE.
翻译:由于伪造信息的广泛性,对事实的检查越来越受到越来越多的关注。大多数事实检查方法都侧重于仅因为其他语言的数据稀缺问题而以英语提出的索赔。缺乏低资源语言的实况核对数据集要求一种有效的跨语言传输技术,以便进行事实检查。此外,不同语言的可靠信息可以相互补充,并有助于核实事实。为此,我们提出了第一个事实检查框架,通过跨语言检索器将从多种语言检索的证据汇总起来,从而强化了跨语言检索框架。鉴于缺少具有类似索赔查询的跨语言信息检索数据集,我们用我们提议的跨语言反克隆任务(X-ICT)来培训检索器,这是一种自我监督的算法,通过翻译一段段落的标题来创造培训实例。 X-ICT的目标是学习跨语言检索方法,让模型学习如何识别与某个翻译的标题对应的通过。在X-Fact数据集中,我们的方法在前系统上零语言交叉设置中实现了2.23%的绝对F1改进。 http://CONFIFF/SONASUD数据是公开的源码和数据。