This article combines humanistic "data critique" with informed inspection of big data analysis. It measures gender bias when gender prediction software tools (Gender API, Namsor, and Genderize.io) are used in historical big data research. Gender bias is measured by contrasting personally identified computer science authors in the well-regarded DBLP dataset (1950-1980) with exactly comparable results from the software tools. Implications for public understanding of gender bias in computing and the nature of the computing profession are outlined. Preliminary assessment of the Semantic Scholar dataset is presented. The conclusion combines humanistic approaches with selective use of big data methods.
翻译:本条将人文主义的“数据批评”与对大数据分析的知情检查结合起来,在历史大数据研究中使用性别预测软件工具(性别API、Namsor和Genderize.io)时衡量性别偏向,衡量性别偏向的方法是,将广受关注的DBLP数据集(1950-1980年)中个人识别的计算机科学作者与软件工具的完全可比的结果进行比较。概述了公众对计算中的性别偏向和计算专业性质的认识的影响。提出了对语义学者数据集的初步评估。结论将人文主义方法与有选择地使用大数据方法结合起来。