In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020. We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims. Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic. ArCOV19-Rumors supports two levels of misinformation detection over Twitter: verifying free-text claims (called claim-level verification) and verifying claims expressed in tweets (called tweet-level verification). Our dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious. Moreover, we present benchmarking results for tweet-level verification on the dataset. We experimented with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.
翻译:在本文中,我们介绍了ArCOV19-Rumors(ARCOV19-Rumors),这是阿拉伯文的COVID-19 Twitter数据集,用于检测错误信息,该数据集由载有2020年1月27日至4月底期间索赔要求的推文构成;我们收集了138项经核实的索赔要求,大多来自公众的实况调查网站,并查明了与这些索赔要求相关的9.4K推特;Tweets用人工附加真实性说明,以支持对错误信息检测的研究,这是大流行病期间面临的主要问题之一;ArCOV19-Rumors支持在Twitter上进行两个层次的错误检测:核实自由文本索赔要求(所谓的索赔级别核查)和核实在推文中表达的主张(所谓的推特级别核查);除了健康外,我们的数据集涵盖与受COVID-19影响的其他主题类别(即社会、政治、体育、娱乐和宗教)相关的主张;此外,我们介绍了在推特上核实数据集的基准结果。我们尝试了SOTA方法的多种方法模式,这些模式既利用了内容、用户简介特征、时间特征,又利用谈话线索的传播结构核查。