Recent studies of gender bias in computing use large datasets involving automatic predictions of gender to analyze computing publications, conferences, and other key populations. Gender bias is partly defined by software-driven algorithmic analysis, but widely used gender prediction tools can result in unacknowledged gender bias when used for historical research. Many names change ascribed gender over decades: the "Leslie problem." Systematic analysis of the Social Security Administration dataset -- each year, all given names, identified by ascribed gender and frequency of use -- in 1900, 1925, 1950, 1975, and 2000 permits a rigorous assessment of the "Leslie problem." This article identifies 300 given names with measurable "gender shifts" across 1925-1975, spotlighting the 50 given names with the largest such shifts. This article demonstrates, quantitatively, there is net "female shift" that likely results in the overcounting of women (and undercounting of men) in earlier decades, just as computer science was professionalizing. Some aspects of the widely accepted 'making programming masculine' perspective may need revision.
翻译:最近对计算中的性别偏见的研究利用了涉及自动预测性别的大型数据集来分析出版物、会议和其他关键人口。性别偏见部分是由软件驱动的算法分析界定的,但广泛使用的性别预测工具在历史研究中使用时可导致未被承认的性别偏见。许多名字几十年来改变了给定的性别:“隐蔽问题 ” 。对社会保障管理局数据集的系统分析 -- -- 每年,所有特定名称,按被确认的性别和使用频率确定 -- -- 19900年、1925年、1950年、1975年和2000年 -- -- 都允许对“文盲问题”进行严格的评估。这一条指明了在1925-1975年期间有可测量的“性别变化”的300个特定名称,突出50个特定名称,而这种变化最大。从数量上看,有净的“女性转移”可能导致前几十年妇女(和男性的低估)的过度计算,正如计算机科学正在专业化一样。广泛接受的“编制男性方案”观点的某些方面可能需要修改。