We study the problem of post-processing a supervised machine-learned regressor to maximize fair binary classification at all decision thresholds. Specifically, we show that by decreasing the statistical distance between each group's score distributions, we can increase fair performance across all thresholds at once, and that we can do so without a significant decrease in accuracy. To this end, we introduce a formal measure of distributional parity, which captures the degree of similarity in the distributions of classifications for different protected groups. In contrast to prior work, which has been limited to studies of demographic parity across all thresholds, our measure applies to a large class of fairness metrics. Our main result is to put forward a novel post-processing algorithm based on optimal transport, which provably maximizes distributional parity. We support this result with experiments on several fairness benchmarks.
翻译:本文研究监督机器学习回归器的后期处理,以实现在所有决策阈值下最大化公平的二元分类。具体而言,我们展示了通过降低每组分数分布之间的统计距离,我们可以同时增加所有阈值的公平性能,并且可以在不显著降低准确性的情况下做到这一点。为此,我们引入了一种正式的分布平等度量,它捕捉了不同受保护组别分类分布之间的相似程度。与先前的研究相比,先前的研究仅限于跨所有阈值的人口平等研究,我们的度量适用于一大类公平度量。我们的主要结果是提出了一种基于最优传输的后处理算法,可以确保最大化分布平等度量。我们在几个公平基准测试上支持了该结果。