The rise of algorithmic decision-making has spawned much research on fair machine learning (ML). Financial institutions use ML for building risk scorecards that support a range of credit-related decisions. Yet, the literature on fair ML in credit scoring is scarce. The paper makes two contributions. First, we provide a systematic overview of algorithmic options for incorporating fairness goals in the ML model development pipeline. In this scope, we also consolidate the space of statistical fairness criteria and examine their adequacy for credit scoring. Second, we perform an empirical study of different fairness processors in a profit-oriented credit scoring setup using seven real-world data sets. The empirical results substantiate the evaluation of fairness measures, identify more and less suitable options to implement fair credit scoring, and clarify the profit-fairness trade-off in lending decisions. Specifically, we find that multiple fairness criteria can be approximately satisfied at once and identify separation as a proper criterion for measuring the fairness of a scorecard. We also find fair in-processors to deliver a good balance between profit and fairness. More generally, we show that algorithmic discrimination can be reduced to a reasonable level at a relatively low cost.
翻译:算法决策的兴起催生了对公平机器学习的大量研究。金融机构利用ML建立风险记分卡支持一系列与信用有关的决定。然而,关于公平信用记分的文献很少。文件作出了两项贡献。首先,我们对将公平目标纳入ML模式发展管道的算法选择进行系统化的概述。在此范围内,我们还合并统计公平标准的空间,并审查它们是否适合信用评分。第二,我们利用七个真实世界数据集对以利润为导向的信用评分系统中的不同公平处理者进行经验性研究。经验性结果证实了公平措施的评价,查明了实施公平信用评分的更多和不太合适的选择,并澄清了贷款决定中的利润公平交易。具体地说,我们发现多重公平标准可以一次大致得到满足,并将分离确定为衡量计分卡公平性的适当标准。我们还发现公平的内部处理者可以在利润和公平之间实现良好的平衡。更一般地说,我们表明算法歧视可以降低成本,降低到合理的水平。