This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022. Despite the recent advance in text based verification techniques and large pre-trained multimodal models cross vision and language, very limited work has been done in applying multimodal techniques to automate fact checking process, particularly considering the increasing prevalence of claims and fake news about images and videos on social media. In our work, the challenge is treated as multimodal entailment task and framed as multi-class classification. Two baseline approaches are proposed and explored including an ensemble model (combining two uni-modal models) and a multi-modal attention network (modeling the interaction between image and text pair from claim and evidence document). We conduct several experiments investigating and benchmarking different SoTA pre-trained transformers and vision models in this work. Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set. Exploratory analysis of dataset is also carried out on the Factify data set and uncovers salient patterns and issues (e.g., word overlapping, visual entailment correlation, source bias) that motivates our hypothesis. Finally, we highlight challenges of the task and multimodal dataset for future research.
翻译:本文介绍了AAAI 2022年多式事实核查(Factify)挑战的参与者系统。尽管最近在基于文本的核查技术和经过预先培训的大型多式联运模型方面有所进步,而且具有不同的远见和语言,但在应用多式联运技术使事实检查过程自动化方面所做的工作非常有限,特别是考虑到权利主张的日益普遍,以及社交媒体图像和视频的假消息日益普遍,在我们的工作中,这一挑战被视为多式要求任务,并被设计成多级分类。提出和探索了两种基线方法,包括混合模型(合并两个单式模型)和多式关注网络(根据索赔和证据文件建模图像和文本对配的互动)。我们在这项工作中进行了几项实验,对不同的SoTA预先培训的变压器和愿景模型进行了基准调查和基准。我们的最佳模型排在领导板上排名第一,在验证和测试中均获得0.77的加权平均F度度。对数据集的探索性分析还在“Fitificificific”数据集中进行,并发现突出的模式和问题(例如词语重叠、视觉要求的关联、来源偏差)激励我们未来的数据假设。