With the expansion of social media and the increasing dissemination of multimedia content, the spread of misinformation has become a major concern. This necessitates effective strategies for multimodal misinformation detection (MMD) that detect whether the combination of an image and its accompanying text could mislead or misinform. Due to the data-intensive nature of deep neural networks and the labor-intensive process of manual annotation, researchers have been exploring various methods for automatically generating synthetic multimodal misinformation - which we refer to as Synthetic Misinformers - in order to train MMD models. However, limited evaluation on real-world misinformation and a lack of comparisons with other Synthetic Misinformers makes difficult to assess progress in the field. To address this, we perform a comparative study on existing and new Synthetic Misinformers that involves (1) out-of-context (OOC) image-caption pairs, (2) cross-modal named entity inconsistency (NEI) as well as (3) hybrid approaches and we evaluate them against real-world misinformation; using the COSMOS benchmark. The comparative study showed that our proposed CLIP-based Named Entity Swapping can lead to MMD models that surpass other OOC and NEI Misinformers in terms of multimodal accuracy and that hybrid approaches can lead to even higher detection accuracy. Nevertheless, after alleviating information leakage from the COSMOS evaluation protocol, low Sensitivity scores indicate that the task is significantly more challenging than previous studies suggested. Finally, our findings showed that NEI-based Synthetic Misinformers tend to suffer from a unimodal bias, where text-only MMDs can outperform multimodal ones.
翻译:随着社交媒体的扩大和多媒体内容的日益传播,错误信息的传播已成为一个主要关切,这就需要制定有效的多式联运错误检测战略,以检测图像及其随附文字的组合是否会误导或误导。由于深神经网络的数据密集性质和人工批注的劳动密集型过程,研究人员一直在探索自动生成合成多式联运错误信息的各种方法,我们称之为合成错误信息者,目的是培训MMMD模型。然而,对真实世界错误信息的评价有限,而且与其他合成错误信息者缺乏比较,使得难以评估实地的进展。为了解决这个问题,我们对现有和新的合成错误数据进行了比较研究,其中涉及:(1) 超文本(OOC)图像配对;(2) 跨模式名称实体不一致(NEI)以及(3) 混合方法,我们用真实世界错误数据模型来评估这些方法;但是,比较研究表明,我们提议的基于CLIP的错误错误信息误报与其他合成错误信息源的比较,难以评估实地进展情况。 为了解决这个问题,我们拟议的CMISS-Swapping Indeformation Sylection(MIS-MIS-MIS)的准确性研究可以大大地显示,而MIS-MIS-MDReval-I的正确性分析方法可以超越了其他模式。</s>