The societal impact of pre-trained language models has prompted researchers to probe them for strong associations between protected attributes and value-loaded terms, from slur to prestigious job titles. Such work is said to probe models for bias or fairness-or such probes 'into representational biases' are said to be 'motivated by fairness'-suggesting an intimate connection between bias and fairness. We provide conceptual clarity by distinguishing between association biases (Caliskan et al., 2022) and empirical fairness (Shen et al., 2022) and show the two can be independent. Our main contribution, however, is showing why this should not come as a surprise. To this end, we first provide a thought experiment, showing how association bias and empirical fairness can be completely orthogonal. Next, we provide empirical evidence that there is no correlation between bias metrics and fairness metrics across the most widely used language models. Finally, we survey the sociological and psychological literature and show how this literature provides ample support for expecting these metrics to be uncorrelated.
翻译:摘要:预训练语言模型的社会影响促使研究人员探究其中受到保护的属性和价值载入术语之间的强关联,从蔑称到著名职位名称。这样的工作被称为探测模型的偏差或公平性,或者这些探测“关于表征偏见的探测”被认为是“出于公平的原因”。这表明偏差和公平之间存在密切关系。我们通过区分关联偏差(Caliskan等人,2022)和经验公平性(Shen等人,2022)提供概念上的清晰度,并展示两者可以相互独立。况且这一情况并不令人意外。为此,我们首先提供了一个思想实验,展示了关联偏差和经验公平性可以完全正交。接下来,我们提供实证证据,表明在使用最广泛的语言模型之间,偏差度量和公平度量之间没有相关性。最后,我们调查了社会学和心理学文献,并展示了这些文献提供了充足的支持,支持预计这些度量之间不会存在相关性。