The sharing of fake news and conspiracy theories on social media has wide-spread negative effects. By designing and applying different machine learning models, researchers have made progress in detecting fake news from text. However, existing research places a heavy emphasis on general, common-sense fake news, while in reality fake news often involves rapidly changing topics and domain-specific vocabulary. In this paper, we present our methods and results for three fake news detection tasks at MediaEval benchmark 2021 that specifically involve COVID-19 related topics. We experiment with a group of text-based models including Support Vector Machines, Random Forest, BERT, and RoBERTa. We find that a pre-trained transformer yields the best validation results, but a randomly initialized transformer with smart design can also be trained to reach accuracies close to that of the pre-trained transformer.
翻译:在社交媒体上分享假新闻和阴谋理论具有广泛的负面影响。 通过设计和应用不同的机器学习模型,研究人员在从文本中探测假新闻方面取得了进展。然而,现有的研究非常强调一般的、常识的假新闻,而现实中,假新闻往往涉及迅速变化的主题和特定域名。在本文中,我们介绍了我们在2021年MediaEval基准中的三个虚假新闻探测任务的方法和结果,其中特别涉及COVID-19相关主题。我们试验了一组基于文本的模型,包括支持矢量机、随机森林、BERT和ROBERTA。我们发现,一个预先培训的变压器产生最佳的验证结果,但一个随机初始化的、设计精巧的变压器也可以接受培训,以达到与培训前变压器相似的变压器。