The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who, why, and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator identities (who) and beliefs (why), drawing from social psychology research about hate speech, free speech, racist beliefs, political leaning, and more. We disentangle what is annotated as toxic by considering posts with three characteristics: anti-Black language, African American English (AAE) dialect, and vulgarity. Our results show strong associations between annotator identity and beliefs and their ratings of toxicity. Notably, more conservative annotators and those who scored highly on our scale for racist beliefs were less likely to rate anti-Black language as toxic, but more likely to rate AAE as toxic. We additionally present a case study illustrating how a popular toxicity detection system's ratings inherently reflect only specific beliefs and perspectives. Our findings call for contextualizing toxicity labels in social variables, which raises immense implications for toxic language annotation and detection.
翻译:语言的可感知毒性可能因某人的身份和信仰而不同,但在收集有毒语言数据集时,这种差异往往被忽视,导致数据组和模型偏差。我们试图了解毒性说明中的偏见背后是谁、为什么和是什么。在对人口和政治多样性参与者进行的两项在线研究中,我们从关于仇恨言论、言论自由、种族主义信仰、政治倾斜等的社会心理学研究中,对批注身份(谁)和信仰(为什么)的影响进行了调查。我们通过考虑三个特征的标签(反黑人语言、非裔美国人英语(AE)方言和粗俗),将附加注释的有毒内容混为一谈。我们的调查结果显示,在说明身份和信仰及其毒性等级之间有着强烈的联系。值得注意的是,保守的批注者和那些对我们种族主义信仰有高度评价的人不太可能将反黑人语言评为有毒,但更可能将AAE评为有毒。我们还提出一份个案研究,说明流行的毒性检测系统评级本身如何反映具体的信仰和观点。我们的调查结果要求在社会变量中贴上毒性标签,这给有毒语言带来巨大的影响。