The emergence of the COVID-19 pandemic and the first global infodemic have changed our lives in many different ways. We relied on social media to get the latest information about the COVID-19 pandemic and at the same time to disseminate information. The content in social media consisted not only health related advises, plans, and informative news from policy makers, but also contains conspiracies and rumors. It became important to identify such information as soon as they are posted to make actionable decisions (e.g., debunking rumors, or taking certain measures for traveling). To address this challenge, we develop and publicly release the first largest manually annotated Arabic tweet dataset, ArCovidVac, for the COVID-19 vaccination campaign, covering many countries in the Arab region. The dataset is enriched with different layers of annotation, including, (i) Informativeness (more vs. less importance of the tweets); (ii) fine-grained tweet content types (e.g., advice, rumors, restriction, authenticate news/information); and (iii) stance towards vaccination (pro-vaccination, neutral, anti-vaccination). Further, we performed in-depth analysis of the data, exploring the popularity of different vaccines, trending hashtags, topics and presence of offensiveness in the tweets. We studied the data for individual types of tweets and temporal changes in stance towards vaccine. We benchmarked the ArCovidVac dataset using transformer architectures for informativeness, content types, and stance detection.
翻译:COVID-19大流行的出现和第一次全球流行改变了我们的生活。我们依靠社交媒体获取有关COVID-19大流行的最新信息,同时传播信息。社交媒体的内容不仅包括与健康相关的咨询、计划和决策者提供的信息性新闻,而且还包含阴谋和谣言。在发布这些信息以作出可采取行动的决定(例如,揭开谣言,或采取某些旅行措施)时,立即查明这些信息变得非常重要。为了应对这一挑战,我们开发并公开发布第一个最大的人工手动阿拉伯推文数据集,ArCovidVac类型,用于覆盖阿拉伯区域许多国家的COVID-19疫苗接种运动。数据集丰富了不同层次的注解,包括:(一) 信息(更多与推文的重要性降低);(二) 微调的推文内容类型(例如,建议、传言、限制、真实的新闻/信息);以及(三) 接种疫苗运动的立场(预测、中立性、抗争量性、抗争量性研究的深度数据类型)。