Social media has become a bedrock for people to voice their opinions worldwide. Due to the greater sense of freedom with the anonymity feature, it is possible to disregard social etiquette online and attack others without facing severe consequences, inevitably propagating hate speech. The current measures to sift the online content and offset the hatred spread do not go far enough. One factor contributing to this is the prevalence of regional languages in social media and the paucity of language flexible hate speech detectors. The proposed work focuses on analyzing hate speech in Hindi-English code-switched language. Our method explores transformation techniques to capture precise text representation. To contain the structure of data and yet use it with existing algorithms, we developed MoH or Map Only Hindi, which means "Love" in Hindi. MoH pipeline consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words. Finally, it employs the fine-tuned Multilingual Bert and MuRIL language models. We conducted several quantitative experiment studies on three datasets and evaluated performance using Precision, Recall, and F1 metrics. The first experiment studies MoH mapped text's performance with classical machine learning models and shows an average increase of 13% in F1 scores. The second compares the proposed work's scores with those of the baseline models and offers a rise in performance by 6%. Finally, the third reaches the proposed MoH technique with various data simulations using the existing transliteration library. Here, MoH outperforms the rest by 15%. Our results demonstrate a significant improvement in the state-of-the-art scores on all three datasets.
翻译:社交媒体已成为全世界人们表达自己观点的基石。 由于匿名特征的自由感更加强烈,我们有可能无视在线社会礼仪,在不面临严重后果的情况下攻击他人,不可避免地传播仇恨言论。目前采取的筛选在线内容和抵消仇恨扩散的措施不够充分。造成这一现象的一个因素是社交媒体中区域语言的普及以及语言灵活的仇恨言论检测器的缺乏。拟议工作的重点是分析印裔英语代码转换语言中的仇恨言论。我们的方法探索了获取精确文本代表的转换技术。为了控制数据结构,并利用现有算法使用这些数据,我们开发了仅使用印地语的MOH或地图。这意味着“爱”印地语。莫赫管道包括语言识别,罗马语到德瓦纳加里印地语的翻版。最后,它采用了经过精细调整的多语言贝尔和穆里尔语语言模型。我们在这里对三个数据集进行了几项定量实验研究,并用Precision、Recall和F1 度来评估了业绩。第一次实验研究是MH或地图仅印地,这意味着“爱”印地,这意味着印印地语中“爱”。MOH管道的“爱”包括语言识别语言,最后的成绩,用13级模型展示了13级的成绩的成绩,最后的成绩模型展示了13级的成绩,最后的成绩的成绩,用13级的成绩模型展示了。在13级模型展示了。