在毒性语言检测说明中实现男女比例平等 (Towards Equal Gender Representation in the Annotations of Toxic Language Detection)

Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opinions of male annotators. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data. We then apply the learned associations between gender and language to toxic language classifiers, finding that models trained exclusively on female-annotated data perform 1.8% better than those trained solely on male-annotated data and that training models on data after removing all offensive words reduces bias in the model by 55.5% while increasing the sensitivity by 0.4%.

翻译：分类者往往传播他们接受培训的数据中存在的偏见。因此,必须了解评论说明者的人口特征如何影响由此得出的模型的公平性。在本文中,我们注重男女对毒性的评论的认知方式的差异,调查这些差异如何导致扩大男注解者意见的模型。我们发现,BERT模型将含有攻击性词的有毒评论与男性注解者联合起来,使模型预测了67.7%的毒性评论为男性注解。我们表明,通过从培训数据中删除冒犯性词和剧毒性评论,可以缩小性别预测之间的这种差异。我们然后将所学的性别与语言之间的联系应用到有毒语言分类者,发现专门接受女性注解数据培训的模型比仅接受男性注解数据培训的模型好1.8%,在删除所有冒犯性词后,数据培训模型中的偏差减少55.5%,同时将敏感度提高0.4%。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/