The internet gives the world an open platform to express their views and share their stories. While this is very valuable, it makes fake news one of our society's most pressing problems. Manual fact checking process is time consuming, which makes it challenging to disprove misleading assertions before they cause significant harm. This is he driving interest in automatic fact or claim verification. Some of the existing datasets aim to support development of automating fact-checking techniques, however, most of them are text based. Multi-modal fact verification has received relatively scant attention. In this paper, we provide a multi-modal fact-checking dataset called FACTIFY 2, improving Factify 1 by using new data sources and adding satire articles. Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three broad categories - support, no-evidence, and refute, with sub-categories based on the entailment of visual and textual data. We also provide a BERT and Vison Transformer based baseline, which acheives 65% F1 score in the test set. The baseline codes and the dataset will be made available at https://github.com/surya1701/Factify-2.0.
翻译:互联网为世界提供了一个开放平台,可以表达他们的观点并分享他们的故事。虽然这非常有价值,但也使假新闻成为我们社会中最紧迫的问题之一。手动事实核查过程耗时,这使得在误导性陈述导致重大危害之前驳斥它们具有挑战性。因此,人们对自动事实或声明验证的兴趣越来越高。一些现有的数据集旨在支持自动化事实检查技术的开发,但其中大多数是基于文本的。多模式事实验证受到相对较少的关注。在本文中,我们提供了一个名为FACTIFY 2的多模式事实核查数据集,该数据集通过使用新的数据源并添加讽刺文章来改进Factify 1。Factify 2有50,000个新数据实例。与FACTIFY 1.0相似,我们有三个大类别--支持,无证据,驳斥,根据视觉和文本数据的蕴含有子类别。我们还提供了一个基于BERT和视觉变换器的基线,在测试集中实现了65%的F1分数。基线代码和数据集将在 https://github.com/surya1701/Factify-2.0 提供。