With the growing adoption of short-form video by social media platforms, reducing the spread of misinformation through video posts has become a critical challenge for social media providers. In this paper, we develop methods to detect misinformation in social media posts, exploiting modalities such as video and text. Due to the lack of large-scale public data for misinformation detection in multi-modal datasets, we collect 160,000 video posts from Twitter, and leverage self-supervised learning to learn expressive representations of joint visual and textual data. In this work, we propose two new methods for detecting semantic inconsistencies within short-form social media video posts, based on contrastive learning and masked language modeling. We demonstrate that our new approaches outperform current state-of-the-art methods on both artificial data generated by random-swapping of positive samples and in the wild on a new manually-labeled test set for semantic misinformation.
翻译:随着社交媒体平台越来越多地采用短式视频,减少通过视频站传播错误信息已成为社交媒体提供者的重大挑战。在本文中,我们开发了在社交媒体站点发现错误信息的方法,利用视频和文本等模式。由于缺乏用于在多模式数据集中检测错误信息的大规模公共数据,我们从Twitter上收集了160,000个视频站点,并利用自我监督的学习来了解联合视觉和文字数据的表达方式。在这项工作中,我们提出了两种新的方法,以对比性学习和蒙面语言模型为基础,在短式社交媒体视频站中发现语义不一致之处。我们展示了我们的新方法在随机抽取阳性样本产生的人工数据方面,以及在野外用新的人工标签测试器检测语义错误方面,都超越了当前最先进的方法。