Along with COVID-19 pandemic we are also fighting an `infodemic'. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm. This is further exacerbated at the time of a pandemic. To tackle this, we curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19. We benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression, Gradient Boost, and Support Vector Machine (SVM). We obtain the best performance of 93.46% F1-score with SVM. The data and code is available at: https://github.com/parthpatwa/covid19-fake-news-dectection
翻译:除了COVID-19大流行外,我们还在与“信息19大流行”作斗争。在社交媒体上,假消息和谣言十分猖獗。相信谣言会造成重大伤害。在大流行病发生时,情况会进一步恶化。为了解决这个问题,我们编辑和发行一个人工附加说明的数据集,其中包括10 700个社交媒体文章和在COVID-19上真实和假新闻的文章。我们用四个机器学习基线——决策树、后勤倒退、渐进靴子和辅助病媒机器(SVM)作为附加说明的数据集的基准。我们得到了SVM93.46%的F1-Score最佳性能。数据和代码见:https://github.com/parthpatwa/covid19-fake-news-dection。