Growing applications of machine learning in policy and social impact settings have raised concern for fairness implications, especially for racial minorities. These concerns have generated considerable interest among machine learning and artificial intelligence researchers, who have developed new methods and established theoretical bounds for improving fairness, focusing on the source data, regularization and model training, or post-hoc adjustments to model scores. However, little work has studied the practical trade-offs between fairness and accuracy in real-world settings to understand how these bounds and methods translate into policy choices and impact on society. Our empirical study fills this gap by investigating the impact on accuracy of mitigating disparities across several policy settings, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programs across education, mental health, criminal justice, and housing safety. We show that fairness-accuracy trade-offs in many applications are negligible in practice. In every setting, we find that explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy. This observation was robust across policy contexts studied, scale of resources available for intervention, time, and relative size of the protected groups. These empirical results challenge a commonly held assumption that reducing disparities either requires accepting an appreciable drop in accuracy or the development of novel, complex methods, making reducing disparities in these applications more practical.
翻译:在政策和社会影响环境中日益应用机器学习的做法引起了人们对公平影响的关切,特别是对种族少数群体的公平影响。这些关切引起了机器学习和人工智能研究人员的极大兴趣,他们开发了新方法,建立了理论界限,以改善公平性,重点是源数据、正规化和示范培训,或对模型评分进行热后调整。然而,很少研究现实世界环境中公平性和准确性之间的实际权衡,以了解这些界限和方法如何转化为政策选择和对社会的影响。我们的经验研究填补了这一差距,调查了减少若干政策环境中差异对准确性的影响,重点是利用机器学习在教育、心理健康、刑事司法和住房安全等受资源限制的方案中通报惠益分配的共同背景。我们表明,许多应用中的公平性、准确性权衡取舍在实践中微不足道。我们发现,在每一种情况下,都明确侧重于实现公平并使用我们提议的减少差异的方法,但又不牺牲了准确性。这一观察在所研究的政策背景中、可用于干预的资源规模、时间和相对规模方面十分有力,重点是利用机器学习为受保护的群体提供参考的共同环境。我们发现,许多应用中的公平性权衡结果要求降低这些复杂差异。