Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.
翻译:社交媒体平台允许用户自由分享自己对问题或任何他们喜欢的东西的看法。 然而, 他们也更容易传播仇恨和虐待内容。 富拉尼族群体是这一不幸现象的受害者。 本文介绍了HERDPhobia, 这是尼日利亚Fulani牧民首个附加注释的仇恨言论数据集, 使用三种语言: 英文、 尼日利亚语- Pidgin 和 Hausa。 我们提出了一个基准实验, 使用预先培训的语言模式将推特归类为仇恨或不恨。 我们的实验显示, XML- T 模式提供了99.83%加权F1的更好表现。 我们在 https://github.com/hausanp/ HehrDPhobia 上发布了数据集, 供进一步研究 。