Over the past few months, there were huge numbers of circulating tweets and discussions about Coronavirus (COVID-19) in the Arab region. It is important for policy makers and many people to identify types of shared tweets to better understand public behavior, topics of interest, requests from governments, sources of tweets, etc. It is also crucial to prevent spreading of rumors and misinformation about the virus or bad cures. To this end, we present the largest manually annotated dataset of Arabic tweets related to COVID-19. We describe annotation guidelines, analyze our dataset and build effective machine learning and transformer based models for classification.
翻译:在过去几个月里,阿拉伯区域传发了大量关于科罗纳病毒(COVID-19)的推文和讨论,决策者和许多人必须确定共享推文的类型,以便更好地了解公众行为、感兴趣的话题、政府的请求、推文来源等。同样重要的是要防止传播有关病毒或坏疗法的谣言和错误信息。为此,我们用人工方式提供了与科罗纳病毒(COVID-19)有关的阿拉伯推文中数量最多的附加说明的数据集。我们描述了批注指南,分析我们的数据集,并建立有效的机器学习和变压器分类模式。