Over the last years, there has been an unprecedented proliferation of fake news. As a consequence, we are more susceptible to the pernicious impact that misinformation and disinformation spreading can have in different segments of our society. Thus, the development of tools for automatic detection of fake news plays and important role in the prevention of its negative effects. Most attempts to detect and classify false content focus only on using textual information. Multimodal approaches are less frequent and they typically classify news either as true or fake. In this work, we perform a fine-grained classification of fake news on the Fakeddit dataset, using both unimodal and multimodal approaches. Our experiments show that the multimodal approach based on a Convolutional Neural Network (CNN) architecture combining text and image data achieves the best results, with an accuracy of 87%. Some fake news categories such as Manipulated content, Satire or False connection strongly benefit from the use of images. Using images also improves the results of the other categories, but with less impact. Regarding the unimodal approaches using only text, Bidirectional Encoder Representations from Transformers (BERT) is the best model with an accuracy of 78%. Therefore, exploiting both text and image data significantly improves the performance of fake news detection.
翻译:过去几年来,假新闻出现了前所未有的扩散。结果,我们更容易受到错误和假信息传播在社会不同阶层可能造成的有害影响。因此,开发自动检测假新闻剧和在防止其负面影响中发挥重要作用的工具。大多数检测和分类假内容的尝试都只侧重于使用文本信息。多式方法较少,通常将新闻列为真实或假新闻。在这项工作中,我们使用单式和多式联运方法对Fakeddid数据集上的假新闻进行细微分类。我们的实验表明,基于将文本和图像数据合并的Convilal Neal网络(CNN)结构的多式联运方法取得了最佳结果,准确度达87%。一些假新闻类别,如Manipoulate 内容、Satire或假链接,从图像的使用中大有裨益。使用图像还改善了其他类别的结果,但影响较小。关于单式方法,我们仅使用文本,即来自变换者(BERT)的双向 Econder Prestications) 和假造图像的准确性数据,因此是最佳模型。