Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality. To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases. While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and the specific harms they measure, and how different measures compare with each other. To address this gap, this work presents a practical framework of harms and a series of questions that practitioners can answer to guide the development of bias measures. As a validation of our framework and documentation questions, we also present several case studies of how existing bias measures in NLP -- both intrinsic measures of bias in representations and extrinsic measures of bias of downstream applications -- can be aligned with different harms and how our proposed documentation questions facilitates more holistic understanding of what bias measures are measuring.
翻译:最近的研究显示,自然语言处理技术(NLP)在性别、种族和国籍等属性相关人口群体方面散布社会偏见。为了建立干预措施并减轻这些偏见和相关伤害,必须能够发现和衡量这些偏见。虽然现有工作为各种任务提出了偏见评估和减轻偏见的方法,但仍然需要一致理解这些偏见和具体伤害,以及它们衡量的措施如何相互比较。为弥补这一差距,这项工作提出了一个伤害的实际框架和一系列问题,从业人员可以回答这些问题来指导偏见措施的制定。作为对我们的框架和文件问题的验证,我们还提出若干个案研究,说明现有国家语言处理中的偏见措施 -- -- 即表现上的偏见的内在措施以及下游应用中的偏见的极端措施 -- -- 如何能够与不同的伤害相一致,以及我们提出的文件问题如何促进更全面地了解衡量哪些偏见措施。