Mis- and disinformation are a substantial global threat to our security and safety. To cope with the scale of online misinformation, researchers have been working on automating fact-checking by retrieving and verifying against relevant evidence. However, despite many advances, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence. Thus, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and conclude by discussing challenges and directions for future defenses.
翻译:错误和不实信息是对我们安保和安全的重大全球性威胁。为了应对在线错误信息的规模,研究人员一直在努力通过检索和核实相关证据实现事实检查自动化。然而,尽管取得了许多进展,但仍然缺乏对这些系统可能攻击的矢量的全面评价。特别是,自动化事实核实程序可能容易受到其试图打击的准确虚假信息运动的伤害。在这项工作中,我们假定对手自动篡改在线证据,以便通过掩盖相关证据或设置误导性证据来破坏事实核对模式。我们首先提出跨越这两个目标和不同威胁模式层面的探索性分类方法。遵循这一方法,我们设计并提议几种潜在的攻击方法。我们表明,有可能在证据中对索赔偏重的断层进行下修改,并产生多样化和理直的证据。因此,我们假设一个对手,在对数据统计层面的许多不同变相下,自动篡改数据模型的性能。这些攻击也是针对后变换数据进行的。我们的分析显示,在进行这些潜在攻击时,我们对这些潜在攻击的反向性趋势进行了进一步分析。我们强调,在进行这种反向性攻击的模型时,我们对这些潜在影响进行了进一步分析。