In the past few years, there has been a significant rise in toxic and hateful content on various social media platforms. Recently Black Lives Matter movement came into the picture, causing an avalanche of user generated responses on the internet. In this paper, we have proposed a Black Lives Matter related tweet hate speech dataset TweetBLM. Our dataset comprises 9165 manually annotated tweets that target the Black Lives Matter movement. We annotated the tweets into two classes, i.e., HATE and NONHATE based on their content related to racism erupted from the movement for the black community. In this work, we also generated useful statistical insights on our dataset and performed a systematic analysis of various machine learning models such as Random Forest, CNN, LSTM, BiLSTM, Fasttext, BERTbase, and BERTlarge for the classification task on our dataset. Through our work, we aim at contributing to the substantial efforts of the research community for the identification and mitigation of hate speech on the internet. The dataset is publicly available.
翻译:在过去几年里,各种社交媒体平台上的有毒和仇恨内容大幅增加。最近黑生命物质运动出现,导致用户在互联网上的反应暴升。在本文中,我们提出了“黑生命物质”相关推特仇恨言论数据集TweetBLM。我们的数据集包括9165个人工推文,针对黑生命物质运动的附加说明的推文。我们根据黑社会运动中与种族主义有关的内容,将推文分为两类,即HATE和非HATE。在这项工作中,我们还生成了有关我们数据集的有用统计见解,并对各种机器学习模型进行了系统分析,如随机森林、CNN、LSTM、BILLSTM、Fastext、BERTbase、BERTbase和BERTmoth等,用于我们数据集的分类任务。我们通过我们的工作,致力于促进研究界为识别和缓解互联网上的仇恨言论做出大量努力。数据集可供公众查阅。