Social biases on Wikipedia, a widely-read global platform, could greatly influence public opinion. While prior research has examined man/woman gender bias in biography articles, possible influences of other demographic attributes limit conclusions. In this work, we present a methodology for analyzing Wikipedia pages about people that isolates dimensions of interest (e.g., gender), from other attributes (e.g., occupation). Given a target corpus for analysis (e.g. biographies about women), we present a method for constructing a comparison corpus that matches the target corpus in as many attributes as possible, except the target one. We develop evaluation metrics to measure how well the comparison corpus aligns with the target corpus and then examine how articles about gender and racial minorities (cis. women, non-binary people, transgender women, and transgender men; African American, Asian American, and Hispanic/Latinx American people) differ from other articles. In addition to identifying suspect social biases, our results show that failing to control for covariates can result in different conclusions and veil biases. Our contributions include methodology that facilitates further analyses of bias in Wikipedia articles, findings that can aid Wikipedia editors in reducing biases, and a framework and evaluation metrics to guide future work in this area.
翻译:维基百科是一个广泛阅读的全球平台,它的社会偏见可以极大地影响公众舆论。虽然先前的研究已经审查了男女在传记文章中的性别偏见,但其他人口属性可能的影响限制了结论。在这项工作中,我们提出了一个方法,分析维基百科网页中将利益层面(如性别)与其他属性(如职业)隔离的人(维基百科网页)与其他属性(如性别)区分开来的人的情况。鉴于有目标的分析资料(如关于妇女的传记),我们提出了一个方法,用来构建一个与目标要素尽可能多的属性匹配的比较材料,但目标特征除外。我们制定评价指标,衡量比较材料与目标要素的匹配程度,然后审查有关性别和种族少数群体(女性、非二元人、变性妇女和变性男子;非洲裔美国人、亚裔美国人和拉美裔/拉丁美洲裔人)的文章与其他文章的差异。除了查明可疑的社会偏见外,我们的结果显示,不能控制共变数可能导致不同的结论和隐蔽偏见。我们的贡献包括有助于进一步分析维基百科文章中的偏见的方法,有助于进一步分析有关结论,有助于维基百科编辑减少偏见的研究结果,以及未来工作框架和衡量标准。