Natural Language Processing (NLP) models propagate social biases about protected attributes such as gender, race, and nationality. To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases. While many existing works propose bias evaluation methodologies for different tasks, there remains a need to cohesively understand what biases and normative harms each of these measures captures and how different measures compare. To address this gap, this work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms. This survey also organizes metrics into different categories to present advantages and disadvantages. Finally, we propose a documentation standard for bias measures to aid their development, categorization, and appropriate usage.
翻译:自然语言处理模式(NLP)传播对性别、种族和国籍等受保护属性的社会偏见。为创建干预措施并减轻这些偏见和相关伤害,必须能够发现和衡量这些偏见。虽然许多现有工作提出了不同任务中的偏见评价方法,但仍需一致理解这些措施中的每一种偏见和规范都有哪些弊端,以及不同的措施如何比较。为弥补这一差距,这项工作全面调查了国家语言规划中现有的偏见措施,作为相关的NLP任务、指标、数据集、社会偏见和相应伤害的函数。这项调查还按不同的类别将衡量指标分为不同的利弊。最后,我们提出了有利于其发展、分类和适当使用的偏见措施的文件标准。