Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and additional pointers such as hashtags and emojis. In this paper, we look at the problem of identifying whether a Twitter post is hateful and offensive or not. We further discriminate the detected toxic content into one of the following three classes: (a) Hate Speech (HATE), (b) Offensive (OFFN) and (c) Profane (PRFN). With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages. On the provided testing corpora, we achieve Macro F1 scores of 90.29, 81.87 and 75.40 for English, German and Hindi respectively while performing hate speech detection and of 60.70, 53.28 and 49.74 during fine-grained classification. In our experiments, we show the efficacy of Perspective API features for hate speech classification and the effects of exploiting a multilingual training scheme. A feature selection study is provided to illustrate impacts of specific features upon the architecture's classification head.
翻译:近年来,在天然语言处理中,发现和分类社会媒体文本中的仇恨事件一直是自然语言处理中一个令人感兴趣的问题。我们的工作利用艺术变异语言模型的状态在多语种环境中识别仇恨言论。通过在社交媒体上显示一个文章或评论的意图,需要仔细评估语言风格、语义内容和更多提示,如标签和模版。在本文中,我们审视了确定推特邮报是否仇恨和冒犯性的问题。我们进一步将检测到的有毒内容分为以下三类:(a) 仇恨言论(HATE),(b) 进攻性(FOFFN)和(c) 青春(PRFN),利用事先经过培训的多语种变异语言文本在基地进行编码,我们能够成功地从多种语言中识别和分类仇恨言论。在提供的测试中,我们为英语、德语和印地语分别取得了90.29、81.87和75.40分的F1分,同时对仇恨言论进行了检测,在精细分类期间将60.70、53.28和49.74分分。在一项实验中,我们展示了对仇恨言论的具体分析结构的影响,我们从多种语言结构中展示了如何选择的特征,我们展示了对仇恨言论的特征的分类。