The Stereotype Content model (SCM) states that we tend to perceive minority groups as cold, incompetent or both. In this paper we adapt existing work to demonstrate that the Stereotype Content model holds for contextualised word embeddings, then use these results to evaluate a fine-tuning process designed to drive a language model away from stereotyped portrayals of minority groups. We find the SCM terms are better able to capture bias than demographic agnostic terms related to pleasantness. Further, we were able to reduce the presence of stereotypes in the model through a simple fine-tuning procedure that required minimal human and computer resources, without harming downstream performance. We present this work as a prototype of a debiasing procedure that aims to remove the need for a priori knowledge of the specifics of bias in the model.
翻译:定型内容模式(SCM)指出,我们倾向于将少数群体视为冷酷、无能或两者兼而有之。 在本文件中,我们调整了现有工作,以表明定型内容模式持有符合背景的字嵌入,然后使用这些结果来评价一个微调过程,旨在使语言模式远离对少数群体的定型描述。我们认为,SCM术语比与舒适有关的人口不可知性术语更能反映偏见。此外,我们通过简单的微调程序减少了模式中存在的陈规定型观念,该程序需要最低限度的人力和计算机资源,而不会损害下游的绩效。我们把这项工作作为消除偏见程序的原型,目的是消除事先了解模式中偏见的具体特征的必要性。