Combating bias in NLP requires bias measurement. Bias measurement is almost always achieved by using lexicons of seed terms, i.e. sets of words specifying stereotypes or dimensions of interest. This reproducibility study focuses on the original authors' main claim that the rationale for the construction of these lexicons needs thorough checking before usage, as the seeds used for bias measurement can themselves exhibit biases. The study aims to evaluate the reproducibility of the quantitative and qualitative results presented in the paper and the conclusions drawn thereof. We reproduce most of the results supporting the original authors' general claim: seed sets often suffer from biases that affect their performance as a baseline for bias metrics. Generally, our results mirror the original paper's. They are slightly different on select occasions, but not in ways that undermine the paper's general intent to show the fragility of seed sets.
翻译:利用种子术语的词典,即明确陈规定型或利益层面的词组,几乎总是能够实现比亚斯测量。这种可复制性研究侧重于原作者的主要主张,即建造这些词组的理由需要在使用前进行彻底检查,因为用于偏见计量的种子本身可以显示出偏见。研究的目的是评估本文及其结论中所提出的数量和质量结果的可复制性。我们转载了支持原作者一般主张的大多数结果:种子组往往受到偏见的影响,这些偏见影响它们作为偏差计量基准的性能。一般而言,我们的结果反映了原始文件。在特定情况下,这些结果略有不同,但不会破坏文件显示种子组脆弱性的一般意图。