Coronavirus disease (COVID-19) is an infectious respiratory disease that was first discovered in late December 2019, in Wuhan, China, and then spread worldwide causing a lot of panic and death. Users of social networking sites such as Facebook and Twitter have been focused on reading, publishing, and sharing novelties, tweets, and articles regarding the newly emerging pandemic. A lot of these users often employ sarcasm to convey their intended meaning in a humorous, funny, and indirect way making it hard for computer-based applications to automatically understand and identify their goal and the harm level that they can inflect. Motivated by the emerging need for annotated datasets that tackle these kinds of problems in the context of COVID-19, this paper builds and releases AraCOVID19-SSD a manually annotated Arabic COVID-19 sarcasm and sentiment detection dataset containing 5,162 tweets. To confirm the practical utility of the built dataset, it has been carefully analyzed and tested using several classification models.
翻译:Corona病毒疾病(COVID-19)是一种传染性呼吸道疾病,2019年12月底在中国武汉首次发现,然后传播到世界各地,造成许多恐慌和死亡。Facebook和Twitter等社交网站的用户一直集中关注阅读、出版和分享关于新出现大流行病的新奇、推文和文章。许多这些用户经常用讽刺来用幽默、滑稽和间接的方式表达其预期含义,使计算机应用程序难以自动理解和确定其目标和他们能够渗透的伤害程度。由于在COVID-19的背景下,对解决这类问题的附加说明数据集的需求正在形成,本文建立并发布了AraCOVID19-SSD,这是一份手动的阿拉伯COVID-19沙卡和情绪检测数据集,其中载有5 162种推文。为了证实已建数据集的实际用途,使用若干分类模型对数据集进行了仔细分析和测试。