This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we then investigate which patterns structure this variation. We find that the targeted demographic category (e.g. gender/sexuality or race/ethnicity) appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group. We also find that words associated with hate speech targeting specific identities often relate to stereotypes, histories of oppression, current social movements, and other social contexts specific to identities. These experiments suggest the importance of considering targeted identity, as well as the social contexts associated with these identities, in automated hate speech classification.
翻译:本文根据所针对的身份,调查仇恨言论如何系统性地不同。在针对特定身份的附加说明的多个仇恨言论数据集中,我们发现,受过针对特定身份群体的仇恨言论培训的分类人员努力推广到其他目标身份。这为按目标身份分列的仇恨言论差异提供了经验证据;然后我们调查这种差异的构成模式。我们发现,目标人口类别(如性别/性或种族/族裔)对仇恨言论语言的影响似乎大于目标身份群体相对社会力量的影响。我们还发现,与针对特定身份的仇恨言论有关的词语往往与陈规定型观念、压迫历史、当前社会运动和其他特定身份的社会背景有关。这些实验表明,在自动仇恨言论分类中,必须考虑到目标身份以及与这些身份相关的社会背景。