Along with the COVID-19 pandemic, an "infodemic" of false and misleading information has emerged and has complicated the COVID-19 response efforts. Social networking sites such as Facebook and Twitter have contributed largely to the spread of rumors, conspiracy theories, hate, xenophobia, racism, and prejudice. To combat the spread of fake news, researchers around the world have and are still making considerable efforts to build and share COVID-19 related research articles, models, and datasets. This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. Our dataset contains 10,828 Arabic tweets annotated with 10 different labels. The labels have been designed to consider some aspects relevant to the fact-checking task, such as the tweet's check worthiness, positivity/negativity, and factuality. To confirm our annotated dataset's practical utility, we used it to train and evaluate several classification models and reported the obtained results. Though the dataset is mainly designed for fake news detection, it can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
翻译:在COVID-19大流行的同时,出现了一个虚假和误导信息的“信息”,使COVID-19回应努力复杂化了。Facebook和Twitter等社交网站在很大程度上促进了流言、阴谋理论、仇恨、仇外心理、种族主义和偏见的传播。为了遏制虚假新闻的传播,世界各地的研究人员已经而且仍在作出相当大的努力,以建立和分享COVID-19相关研究文章、模型和数据集。本文发行了“AraCOVID19-MFH”一幅人工标记的多标签阿拉伯文 COVID-19假新闻和仇恨言论探测数据集。我们的数据集包含10 828个阿拉伯推特,带有10个不同的标签。这些标签的设计是为了考虑与事实核对任务有关的某些方面,例如推特的校验价值、自相/强性和事实质量。为了证实我们的附加说明的数据集的实用性,我们用它来训练和评价若干分类模型并报告所获得的结果。尽管数据集主要设计为假新闻检测,但也可以用于识别仇恨言论、观点、其他辩证和辩证任务。