The use of automated decision tools in recruitment has received an increasing amount of attention. In November 2021, the New York City Council passed a legislation (Local Law 144) that mandates bias audits of Automated Employment Decision Tools. From 15th April 2023, companies that use automated tools for hiring or promoting employees are required to have these systems audited by an independent entity. Auditors are asked to compute bias metrics that compare outcomes for different groups, based on sex/gender and race/ethnicity categories at a minimum. Local Law 144 proposes novel bias metrics for regression tasks (scenarios where the automated system scores candidates with a continuous range of values). A previous version of the legislation proposed a bias metric that compared the mean scores of different groups. The new revised bias metric compares the proportion of candidates in each group that falls above the median. In this paper, we argue that both metrics fail to capture distributional differences over the whole domain, and therefore cannot reliably detect bias. We first introduce two metrics, as possible alternatives to the legislation metrics. We then compare these metrics over a range of theoretical examples, for which the legislation proposed metrics seem to underestimate bias. Finally, we study real data and show that the legislation metrics can similarly fail in a real-world recruitment application.
翻译:2021年11月,纽约市议会通过了一项立法(第144号地方法律),规定对自动化就业决策工具进行偏见审计。从2023年4月15日起,使用自动化工具雇用或晋升雇员的公司必须由独立实体对这些系统进行审计。要求审计员至少根据性别/性别以及种族/族裔类别,计算出不同群体结果对比的偏见指标,至少要根据性别/性别、种族/族裔类别比较结果;第144号地方法律为回归任务提出了新的偏见指标(假设,自动系统用一系列连续价值对候选人进行评分);以前版本的立法建议了一种偏见指标,比较不同群体的平均分数。新的订正偏见指标比较了每个群体中超过中位的候选人的比例。在本文件中,我们指出,两种指标都无法反映整个领域的分布差异,因此无法可靠地发现偏差。我们首先提出了两种衡量标准,作为立法衡量尺度的可能替代办法。我们然后将这些衡量这些衡量尺度比一系列理论例子,而立法似乎低估了真实的偏差。最后,我们研究实际的衡量标准数据,并显示实际的衡量立法失败。