Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demonstrate discriminatory behavior, especially when presented with biased data. Consequently, fairness in classification has emerged as a high-priority research area. Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness, including the topic of fair classification. The interdisciplinary efforts in fair classification, with machine learning research having the largest presence, have resulted in a large number of fairness notions and a wide range of approaches that have not been systematically evaluated and compared. In this paper, we contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, robustness to data errors, sensitivity to underlying ML model, data efficiency, and stability using a variety of metrics and real-world datasets. Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance. We also discuss general principles for choosing approaches suitable for different practical settings, and identify areas where data-management-centric solutions are likely to have the most impact.
翻译:数据管理研究显示,与数据和算法公正有关的专题,包括公平分类专题,日益受到关注和关注。以机器学习研究为主的公平分类的跨学科努力产生了大量公平概念和广泛的方法,这些概念和办法尚未系统地评估和比较。我们在本文件中还广泛分析了13种公平的分类办法和其他变式,分析其正确性、公平性、效率、可缩放性、数据误差的稳健性、对基本ML模型的敏感性、数据效率和稳定性,并使用各种指标和现实世界数据集。我们的分析突出了关于不同计量和高层次方法特征对不同业绩方面的影响的新见解。我们还讨论了选择适合不同实际环境的方法的一般原则,并确定了最有可能产生数据管理影响的解决办法的领域。