Risk assessment instrument (RAI) datasets, particularly ProPublica's COMPAS dataset, are commonly used in algorithmic fairness papers due to benchmarking practices of comparing algorithms on datasets used in prior work. In many cases, this data is used as a benchmark to demonstrate good performance without accounting for the complexities of criminal justice (CJ) processes. We show that pretrial RAI datasets contain numerous measurement biases and errors inherent to CJ pretrial evidence and due to disparities in discretion and deployment, are limited in making claims about real-world outcomes, making the datasets a poor fit for benchmarking under assumptions of ground truth and real-world impact. Conventional practices of simply replicating previous data experiments may implicitly inherit or edify normative positions without explicitly interrogating assumptions. With context of how interdisciplinary fields have engaged in CJ research, algorithmic fairness practices are misaligned for meaningful contribution in the context of CJ, and would benefit from transparent engagement with normative considerations and values related to fairness, justice, and equality. These factors prompt questions about whether benchmarks for intrinsically socio-technical systems like the CJ system can exist in a beneficial and ethical way.
翻译:风险评估工具(RAI)数据集,特别是ProPublica的COMPAS数据集,通常用于算法公平性文件,这是因为比较以往工作中使用的数据集算法的基准做法,在许多情况下,这些数据被用作一种基准,以证明良好业绩,而不考虑刑事司法程序的复杂性。我们表明,审前风险评估工具数据集包含许多衡量偏见和误差,特别是ProPublica的COMPAS数据集,在对真实世界结果进行索赔时受到限制,使数据集在地面事实和真实世界影响的假设下不适合基准设定。简单复制以往数据试验的常规做法可能隐含地继承或调整规范立场,而没有明确询问假设。在跨学科领域如何参与CJ研究的背景下,算法公平做法在CJ工作中作出有意义的贡献方面有误,而且如果透明地参与与公平、正义和平等有关的规范性考虑和价值观,则会受益。这些因素引发了这样一个问题,即像CJ系统这样的内在社会技术系统是否能够以有益和合乎道德的方式存在基准。