Bias-measuring datasets play a critical role in detecting biased behavior of language models and in evaluating progress of bias mitigation methods. In this work, we focus on evaluating gender bias through coreference resolution, where previous datasets are either hand-crafted or fail to reliably measure an explicitly defined bias. To overcome these shortcomings, we propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation, and construct Counter-GAP, an annotated dataset consisting of 4008 instances grouped into 1002 quadruples. We further identify a bias cancellation problem in previous group-level metrics on Counter-GAP, and propose to use the difference between inconsistency across genders and within genders to measure bias at a quadruple level. Our results show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group, and that a name-based counterfactual data augmentation method is more effective to mitigate such bias than an anonymization-based method.
翻译:为克服这些缺陷,我们提出了一个新颖的方法,通过反事实生成收集多样、自然和最短距离的文本对,并建立一个由4008个案例组成的附加说明的反GAP数据集,分为1002个四分之一。我们进一步查明了先前关于反GAP的团体一级指标中存在取消偏见的问题,并提议使用性别之间和性别内部不一致的差异来衡量四分之一的偏见。我们的结果显示,四个经过预先培训的语言模式在不同的性别群体中比每个群体中差异更大,基于名称的反事实数据增强方法比基于匿名的方法更能有效减轻这种偏见。