Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism. The dataset poses multiple unique challenges: limited data and label imbalance, reliance on external knowledge, multimodal reasoning, layout dependency, and noise from OCR. We test multiple widely-used unimodal and multimodal models on this dataset. The experiments show that the room for improvement is still huge for current models.
翻译:虚假信息已成为社交媒体上的一个严重问题。 特别是,由于其格式短、视觉吸引和幽默性,Memes在网上社区传播方面具有很大优势,使它们成为传播虚假信息的有效工具。 我们提供了DisinfoMeme,以帮助检测虚假信息Memes。 数据集包含来自Reddit的迷因,涵盖三个当前主题:COVID-19大流行、黑人生命物质运动和素食主义/植物主义。 数据集提出了多种独特的挑战:有限的数据和标签不平衡、依赖外部知识、多式推理、布局依赖和来自OCR的噪音。 我们测试了这个数据集上多种广泛使用的单式和多式模式模式模式。 实验显示,对于当前模型来说,改进的空间仍然很大。