Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters.
翻译:在网上谈话中,通常使用机器学习模型来检测毒性。这些模型在由人类标数员加注的数据集上接受培训。我们探索了评级人自我描述的身份如何影响在线评论中的毒性。我们首先定义了专门标数池的概念:根据评级人自我描述的身份而不是随机而形成的标数池。我们为研究-专业化标数库建立了三个这样的标数池,这些标数库确定为非裔美国人、男女同性恋、双性恋和变性者,以及识别身份者两者的标数。每个标数器集合了一个相同的注释集,其中含有许多有关这些身份组的参考。我们发现,标数身份在统计学上是一个重要因素,表明评级人如何对与身份有关的说明作说明作说明的毒性说明。我们利用初步的内容分析,用最不一致的标数池对评论进行了审查,发现毒性说明有细微差异。接下来,我们训练了关于每个不同的标数池的说明的模型模型,并将这些模型的分数对几个测试组的评论进行比较。最后,我们讨论了如何使用标数者、更包容性的评级和更能提供比随机的标数的标数。我们讨论如何使用这些标数的标数,这些标数的标数比的标数能更能更能提供更能提供更包容的标数。