In recent years people have become increasingly reliant on social media to read news and get information, and some social media users post unsubstantiated information to gain attention. Such information is known as rumours. Nowadays, rumour detection is receiving a growing amount of attention because of the pandemic of the New Coronavirus, which has led to a large number of rumours being spread. In this paper, a Natural Language Processing (NLP) system is built to predict rumours. The best model is applied to the COVID-19 tweets to conduct exploratory data analysis. The contribution of this study is twofold: (1) to compare rumours and facts using state-of-the-art natural language processing models in two dimensions: language structure and propagation route. (2) An analysis of how rumours differ from facts in terms of their lexical use and the emotions they imply. This study shows that linguistic structure is a better feature to distinguish rumours from facts compared to the propagation path. In addition, rumour tweets contain more vocabulary related to politics and negative emotions.
翻译:近年来,人们越来越依赖社交媒体来阅读新闻和获取信息,一些社交媒体用户发布未经证实的信息以吸引关注。这种信息被称为谣言。如今,由于新冠疫情的大流行导致了大量的谣言传播,因此谣言检测正在受到越来越多的关注。本文构建了一个自然语言处理(NLP)系统来预测谣言。最佳模型应用于 COVID-19 推文以进行探索性数据分析。本研究的贡献有两个方面:(1)使用最先进的自然语言处理模型在语言结构和传播路径两个维度上比较谣言和事实。(2)分析谣言和事实在其词汇使用和所暗示的情感方面的差异。本研究表明,与传播路径相比,语言结构更能够区分谣言和事实。此外,谣言推文包含更多与政治和负面情感相关的词汇。