Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer learning proved to be state-of-the-art methods for multiple well-known natural language processing tasks. In this paper, we propose MisRoB{\AE}RTa, a novel transformer-based deep neural ensemble architecture for misinformation detection. MisRoB{\AE}RTa takes advantage of two transformers (BART \& RoBERTa) to improve the classification performance. We also benchmarked and evaluated the performances of multiple transformers on the task of misinformation detection. For training and testing, we used a large real-world news articles dataset labeled with 10 classes, addressing two shortcomings in the current research: increasing the size of the dataset from small to large, and moving the focus of fake news detection from binary classification to multi-class classification. For this dataset, we manually verified the content of the news articles to ensure that they were correctly labeled. The experimental results show that the accuracy of transformers on the misinformation detection problem was significantly influenced by the method employed to learn the context, dataset size, and vocabulary dimension. We observe empirically that the best accuracy performance among the classification models that use only one transformer is obtained by BART, while DistilRoBERTa obtains the best accuracy in the least amount of time required for fine-tuning and training. The proposed MisRoB{\AE}RTa outperforms the other transformer models in the task of misinformation detection. To arrive at this conclusion, we performed ample ablation and sensitivity testing with MisRoB{\AE}RTa on two datasets.
翻译:翻译摘要:
误导信息被认为是我们民主价值观和原则的威胁。这些内容在社交媒体上的传播会极端化社会并通过扭曲公众看法和引发社会动荡而破坏公共话语,同时缺乏传统新闻业的严谨性。Transformer和迁移学习已被证明是多个著名自然语言处理任务的最先进方法。在本文中,我们提出了MisRoBÆRTa,一种针对误导信息检测的新型基于Transformer的深度神经元组架构。MisRoBÆRTa利用了两个Transformer(BART和RoBERTa)来提高分类性能。我们还对多个Transformer在误导信息检测任务上的表现进行了基准测试和评估。为了训练和测试,我们使用了一个大型的真实新闻文章数据集,标记了10个类别,解决了当前研究中的两个缺点:将数据集大小从小到大进行增加,并将对假新闻的检测重点从二分类转移到多类分类。对于这个数据集,我们手动验证了新闻文章的内容,以确保它们被正确标记。实验结果表明,Transformer在误导信息检测问题上的准确性受到学习上下文的方法,数据集大小和词汇维度的显著影响。我们经验性地观察到,只使用一个Transformer的分类模型中最佳的准确性性能由BART获得,而DistilRoBERTa在最少的细化和训练时间内获得了最佳的准确性。提出的MisRoBÆRTa在误导信息检测任务中优于其他Transformer模型。为了得出这个结论,我们在MisRoBÆRTa上进行了大量评估和敏感度测试,并在两个数据集上进行了测试。