Detection and Analysis of Offensive Online Content in Hausa Language

Hausa, a major Chadic language spoken by over 100 million people mostly in West Africa is considered a low-resource language from a computational linguistic perspective. This classification indicates a scarcity of linguistic resources and tools necessary for handling various natural language processing (NLP) tasks, including the detection of offensive content. To address this gap, we conducted two set of studies (1) a user study (n=101) to explore cyberbullying in Hausa and (2) an empirical study that led to the creation of the first dataset of offensive terms in the Hausa language. We developed detection systems trained on this dataset and compared their performance against relevant multilingual models, including Google Translate. Our detection system successfully identified over 70% of offensive, whereas baseline models frequently mistranslated such terms. We attribute this discrepancy to the nuanced nature of the Hausa language and the reliance of baseline models on direct or literal translation due to limited data to build purposive detection systems. These findings highlight the importance of incorporating cultural context and linguistic nuances when developing NLP models for low-resource languages such as Hausa. A post hoc analysis further revealed that offensive language is particularly prevalent in discussions related to religion and politics. To foster a safer online environment, we recommend involving diverse stakeholders with expertise in local contexts and demographics. Their insights will be crucial in developing more accurate detection systems and targeted moderation strategies that align with cultural sensitivities.

翻译：暂无翻译

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日