公平回归的成本和收益 (Costs and Benefits of Fair Regression)

Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not entirely clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make an error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. With our novel lower bound, we also show that the price paid by a fair regressor that does not take the protected attribute as input is less than that of a fair regressor with explicit access to the protected attribute. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.

翻译：在高取量域中,机器学习工具在现实世界中的应用往往被规范为公平,因为预测的目标应该满足一定数量上对等的概念,而受保护的属性应满足某种数量上对等的概念。然而,公平与准确与实际价值目标之间的精确权衡并不完全清楚。在本文中,我们通过对任何公平回归者错误的较低约束,将统计均等与回归环境的准确性之间的内在权衡描述为对任何公平回归者错误的较低约束。我们的较低约束是尖锐的,不依赖算法,并接受简单的解释:当目标的时刻不同时,任何公平算法必须至少对一个群体作出错误的量化的量化概念。我们进一步扩展这一结果,以降低任何(约)公平算法和准确性之间的联合错误的界限,使用瓦塞斯特斯坦距离来衡量近似值的质量。我们的新低约束,我们还表明公平回归者所付出的代价低于公平回归者所保护的属性,明确获得受保护属性的公平回归者。在上,任何公平算法都必须对至少一个群体做出一个错误。我们个人公平、准确性等值之间的第一个联系,而通过两个不同程度的准确度之间则显示我们之间,我们通过两次的准确的准确的准确度,我们通过两次的排序,我们之间,我们通过一个相对的排序,一个相对的顺序,我们通过两种推算法的顺序,我们通过两种推算法的顺序显示,我们之间的距离,我们之间则显示一个相对之间的距离,我们之间的一个相对,我们之间的距离,通过两种推算。