Protection of human rights is one of the most important problems of our world. In this paper, our aim is to provide a dataset which covers one of the most significant human rights contradiction in recent months affected the whole world, George Floyd incident. We propose a labeled dataset for topic detection that contains 17 million tweets. These Tweets are collected from 25 May 2020 to 21 August 2020 that covers 89 days from start of this incident. We labeled the dataset by monitoring most trending news topics from global and local newspapers. Apart from that, we present two baselines, TF-IDF and LDA. We evaluated the results of these two methods with three different k values for metrics of precision, recall and f1-score. The collected dataset is available at https://github.com/MeysamAsgariC/BLMT.
翻译:保护人权是我们世界上最重要的问题之一。在本文中,我们的目标是提供一套数据,涵盖近几个月来影响整个世界的最重大人权矛盾之一,即乔治·弗洛伊德事件。我们提出一个标有标签的数据集,用于探测专题,其中包括1 700万条推文。这些Tweets是2020年5月25日至2020年8月21日收集的,涵盖从这一事件开始的89天。我们通过监测全球和地方报纸上最流行的新闻主题来标出数据集。除此之外,我们提出了两个基线,即TF-IDF和LDA。我们用精确度、回顾度和F1-Score三个不同的k值评估了这两种方法的结果。所收集的数据集可在https://github.com/MeysamAsgariC/BLMT查阅。