In the criminal legal context, risk assessment algorithms are touted as data-driven, well-tested tools. Studies known as validation tests are typically cited by practitioners to show that a particular risk assessment algorithm has predictive accuracy, establishes legitimate differences between risk groups, and maintains some measure of group fairness in treatment. To establish these important goals, most tests use a one-shot, single-point measurement. Using a Polya Urn model, we explore the implication of feedback effects in sequential scoring-decision processes. We show through simulation that risk can propagate over sequential decisions in ways that are not captured by one-shot tests. For example, even a very small or undetectable level of bias in risk allocation can amplify over sequential risk-based decisions, leading to observable group differences after a number of decision iterations. Risk assessment tools operate in a highly complex and path-dependent process, fraught with historical inequity. We conclude from this study that these tools do not properly account for compounding effects, and require new approaches to development and auditing.
翻译:在刑事法律方面,风险评估算法被称作数据驱动的、经过良好测试的工具。实践者通常引用称为验证测试的研究,以表明特定风险评估算法具有预测性准确性,确定风险群体之间的合理差异,并保持某种程度的公平待遇。为了确立这些重要目标,大多数测试都使用一分一分、一分一分的测量方法。我们使用一个Polica Urn模型,探索反馈效应在连续的评分决定过程中的影响。我们通过模拟表明,风险可以以不通过一分试验捕捉到的方式在顺序决定上传播。例如,即使风险分配中存在很小或无法检测的偏差程度,也能扩大基于风险的顺序决定,导致在作出若干决定重复之后出现可观察到的群体差异。风险评估工具在高度复杂和路径独立的过程中运作,充满历史的不公平。我们从这项研究中得出结论,这些工具并没有适当地说明复合效应,需要新的发展和审计方法。