As language models grow in popularity, it becomes increasingly important to clearly measure all possible markers of demographic identity in order to avoid perpetuating existing societal harms. Many datasets for measuring bias currently exist, but they are restricted in their coverage of demographic axes and are commonly used with preset bias tests that presuppose which types of biases models can exhibit. In this work, we present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes. HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms. These descriptors combine with a set of bias measurement templates to produce over 450,000 unique sentence prompts, which we use to explore, identify, and reduce novel forms of bias in several generative models. We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models, as well as in an offensiveness classifier. We will invite additions and amendments to the dataset, which we hope will serve as a basis for more easy-to-use and standardized methods for evaluating bias in NLP models.
翻译:随着语言模式越来越受欢迎,越来越有必要明确衡量人口特征的所有可能标志,以避免使现有的社会伤害永久化。许多衡量偏见的数据集目前存在,但它们在人口轴的覆盖范围上受到限制,通常用于预先设定偏见测试,预先假定偏见模式可以显示哪些类型的偏见。在这项工作中,我们提出了一个新的、更具包容性的偏见计量数据集,HolisticBias,其中包括13个不同人口轴的近600个描述性术语。HolisticBias是一个参与性过程,包括有这些术语经验的专家和社区成员。这些描述与一套偏见计量模板相结合,制作了450 000多个独特的句子提示,我们用这些提示来探索、识别和减少几种基因化模型中新的偏见形式。我们证明,HolisticBias在衡量语言模型的象征性可能性方面,以及在攻击性分类中,对于衡量无法察觉的偏见是有效的。我们将邀请对数据集进行增补和修正,我们希望这将作为比较容易使用和标准化的方法,用以评价NLP模型中的偏见。