Disinformation entails the purposeful dissemination of falsehoods towards a greater dubious agenda and the chaotic fracturing of a society. The general public has grown aware of the misuse of social media towards these nefarious ends, where even global public health crises have not been immune to misinformation (deceptive content spread without intended malice). In this paper, we examine nearly 505K COVID-19-related tweets from the initial months of the pandemic to understand misinformation as a function of bot-behavior and engagement. Using a correlation-based feature selection method, we selected the 11 most relevant feature subsets among over 170 features to distinguish misinformation from facts, and to predict highly engaging misinformation tweets about COVID-19. We achieved an average F-score of at least 72\% with ten popular multi-class classifiers, reinforcing the relevance of the selected features. We found that (i) real users tweet both facts and misinformation, while bots tweet proportionally more misinformation; (ii) misinformation tweets were less engaging than facts; (iii) the textual content of a tweet was the most important to distinguish fact from misinformation while (iv) user account metadata and human-like activity were most important to predict high engagement in factual and misinformation tweets; and (v) sentiment features were not relevant.
翻译:公众日益意识到社会媒体被滥用于这些邪恶目的,甚至全球公共卫生危机也未能免受错误信息的影响(虚假内容在无蓄意恶意的情况下传播了虚假内容);在本文中,我们审查了从大流行病最初几个月以来,近505K COVID-19的推特,以了解错误信息是博特行为和参与的一种功能;利用基于相关特征的选择方法,我们从170多个特征中挑选了11个最相关的子集,以区分错误信息与事实,并预测大量使用关于COVID-19的错误信息推文。我们取得了至少72个平均F级的F级,有10个受欢迎的多级分类人员,加强了选定特征的相关性。我们发现:(一) 真实用户在事实和错误信息上都作了推特,而博特的推文则比事实更具有误导性;(二) 错误信息推文比事实更不吸引人情味;(三) 推特的文本内容是最重要的区别事实和错误信息,同时(四) 用户账户的元数据和人类活动是最重要的预感。