To tackle the rising phenomenon of hate speech, efforts have been made towards data curation and analysis. When it comes to analysis of bias, previous work has focused predominantly on race. In our work, we further investigate bias in hate speech datasets along racial, gender and intersectional axes. We identify strong bias against African American English (AAE), masculine and AAE+Masculine tweets, which are annotated as disproportionately more hateful and offensive than from other demographics. We provide evidence that BERT-based models propagate this bias and show that balancing the training data for these protected attributes can lead to fairer models with regards to gender, but not race.
翻译:为解决仇恨言论不断上升的现象,已经作出努力,争取数据整理和分析。在分析偏见方面,以前的工作主要集中在种族上。我们在工作中进一步调查种族、性别和交叉轴心等仇恨言论数据集中的偏见。我们发现,对非裔美国人英语、男性和AAE+Masculine的推特有强烈的偏见,这些推特的注释比其他人口多得多的仇恨和冒犯。我们提供证据表明,基于BERT的模型传播了这种偏见,并表明平衡这些受保护属性的培训数据可以导致在性别而不是种族方面建立更公平的模式。