Mis- and disinformation are now a substantial global threat to our security and safety. To cope with the scale of online misinformation, one viable solution is to automate the fact-checking of claims by retrieving and verifying against relevant evidence. While major recent advances have been achieved in pushing forward the automatic fact-verification, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence, or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence, in addition to generating diverse and claim-aligned evidence. As a result, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.
翻译:错误和不实信息现已成为全球对我国安保和安全的重大威胁。为了应对在线错误信息的规模,一个可行的解决办法是通过检索和根据相关证据核查,使对索赔要求进行事实检查自动化。虽然在推进自动事实核实方面最近取得了重大进展,但仍然缺乏对这些系统可能攻击的矢量的全面评估。特别是,自动化事实核实程序可能易受到它试图打击的准确虚假信息运动的影响。在这项工作中,我们假定对手自动篡改在线证据,以便通过标榜相关证据或设置误导性证据来破坏事实核对模型。我们首先提出探索性分类方法,涵盖这两个目标和不同威胁模型层面。我们以此为指导,设计并提出了几种潜在的攻击方法。我们表明,除了能够产生多样和理直不相的证据外,还有可能对证据的偏差进行下调。结果是,我们通过对不同用途的解读,通过标定相关证据的反复使用,或者设置一个误导性模型来破坏事实核对模型。我们随后在分析中要对袭击进行反复分析时,还可能要用什么样的证据来修正。