Misinformation is now a major problem due to its potential high risks to our core democratic and societal values and orders. Out-of-context misinformation is one of the easiest and effective ways used by adversaries to spread viral false stories. In this threat, a real image is re-purposed to support other narratives by misrepresenting its context and/or elements. The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. To integrate evidence and cues from both modalities, we introduce the concept of 'multi-modal cycle-consistency check'; starting from the image/caption, we gather textual/visual evidence, which will be compared against the other paired caption/image, respectively. Moreover, we propose a novel architecture, Consistency-Checking Network (CCN), that mimics the layered human reasoning across the same and different modalities: the caption vs. textual evidence, the image vs. visual evidence, and the image vs. caption. Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking, and significantly outperforms previous baselines that did not leverage external evidence.
翻译:错误信息现在是一个大问题,因为它对我们的核心民主和社会价值观和秩序具有潜在的高度风险。超文本错误信息是对手用来传播病毒性虚假故事的最简单和有效的方法之一。在这个威胁中,真正的图像被重新定位,通过歪曲其上下文和/或元素来支持其他叙述。互联网正在被用作利用不同来源和模式来核查信息的一种通向途径。我们的目标是通过利用网络证据对图像成对进行事实检查,使这个耗时和推理密集的过程自动化。为了整合两种模式的证据和提示,我们引入了“多模式周期一致性检查”的概念;从图像/图像开始,我们收集文本/视觉证据,这些证据将分别与其他配对的字幕/图像进行比较。此外,我们提出了一个新的结构,即Consistity-校准网络(CCN),它模拟了同一和不同模式的层次人类推理:标题对文本证据、图像相对于周期一致度检查”的概念;图像与图像基准对比,以及图像对比我们之前的基底线和图像。