This paper provides a new approach for offensive language and hate speech detection on social media. Our approach incorporates an offensive lexicon composed of implicit and explicit offensive and swearing expressions annotated with binary classes: context-dependent and context-independent offensive. Due to the severity of the hate speech and offensive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the proposed method. Nevertheless, our proposal may be applied to any other language or domain. Based on the obtained results, the proposed approach showed high-performance overcoming the current baselines for European and Brazilian Portuguese.
翻译:本文为在社交媒体上发现攻击性语言和仇恨言论提供了一个新方法。我们的方法包含一个进攻性词汇,由隐含和明确的攻击性和咒语表达组成,带有二进制等级:根据背景和背景进行攻击。由于巴西仇恨言论和攻击性评论的严重性,以及缺乏葡萄牙语研究,巴西葡萄牙语是用来验证拟议方法的语言。然而,我们的建议可以适用于任何其他语言或领域。根据所获得的结果,拟议方法显示,在克服欧洲和巴西葡萄牙语目前基线方面表现良好。