Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media. Our approach embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the models. Nevertheless, our method may be applied to any other language. The conducted experiments show the effectiveness of the proposed approach, outperforming the current baseline methods for the Portuguese language.
翻译:由于基于词汇法的方法在科学上比较优雅,解释了解决方案的组成部分,更容易推广到其他应用中,本文件为社交媒体中攻击性语言和仇恨言论探测提供了新的方法,我们的方法体现了隐含和明确冒犯和咒语的词汇,并附有背景信息。由于巴西社会媒体的辱骂性评论十分严重,而且没有用葡萄牙语进行研究,巴西葡萄牙语是用来验证这些模式的语言。然而,我们的方法可以适用于任何其他语言。所进行的实验表明拟议方法的有效性,优于目前葡萄牙语的基线方法。