复杂:RAI数据集与算法公平基准之间的混乱关系 (It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks)

Risk assessment instrument (RAI) datasets, particularly ProPublica's COMPAS dataset, are commonly used in algorithmic fairness papers due to benchmarking practices of comparing algorithms on datasets used in prior work. In many cases, this data is used as a benchmark to demonstrate good performance without accounting for the complexities of criminal justice (CJ) processes. However, we show that pretrial RAI datasets can contain numerous measurement biases and errors, and due to disparities in discretion and deployment, algorithmic fairness applied to RAI datasets is limited in making claims about real-world outcomes. These reasons make the datasets a poor fit for benchmarking under assumptions of ground truth and real-world impact. Furthermore, conventional practices of simply replicating previous data experiments may implicitly inherit or edify normative positions without explicitly interrogating value-laden assumptions. Without context of how interdisciplinary fields have engaged in CJ research and context of how RAIs operate upstream and downstream, algorithmic fairness practices are misaligned for meaningful contribution in the context of CJ, and would benefit from transparent engagement with normative considerations and values related to fairness, justice, and equality. These factors prompt questions about whether benchmarks for intrinsically socio-technical systems like the CJ system can exist in a beneficial and ethical way.

翻译：风险评估工具(RAI)数据集,特别是ProPublica的COMPAS数据集,由于比较以往工作中使用的数据集算法的比较算法和现实世界影响的假设,在算法公平性文件中通常使用算法公平做法,因为比较以往工作中使用的数据集算法的比较算法做法,在许多情况下,这些数据被用作一种基准,以表明良好的业绩,而没有考虑到刑事司法程序的复杂性;然而,我们表明,审前RAI数据集可能包含许多计量偏差和错误,而且由于酌处权和部署方面的差异,因此,适用于RAI数据集的算法公平做法在对现实世界结果进行索赔时有限,使RAI数据集无法在地面真相和真实世界影响的假设下,不适于基准化。此外,简单复制以往数据实验的常规做法可能隐含着继承或调整规范立场,而没有明确询问价值拉动的假设。然而,如果没有跨学科领域如何参与CJ研究,以及RAI公司如何在上游和下游运作的背景,算法公平做法在作出有意义的贡献时会受到误解,而且如果透明地参与与公平、正义和平等有关的规范性考虑和价值观,那么,则会有利于C系统。