Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make a large error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.
翻译:在高目标领域,机器学习工具在现实世界中的应用往往被规范为公平,因为预测的目标应该满足一定数量上对一个受保护属性的等同概念。然而,公平与准确与实际价值目标之间的精确权衡并不明确。在本文中,我们用任何公平回归者错误的下限来描述回归环境在统计均等与准确之间的内在权衡。我们的下限是尖锐的,不依赖算法,并承认一个简单的解释:当目标的时刻不同时,任何公平算法必须至少对一个群体作出重大错误。我们进一步扩展这一结果,以降低任何(约)公平算法的共同错误的界限,使用瓦塞斯坦距离来衡量近似质量。在上下文,我们通过提供个人公平、准确均等和瓦塞斯坦距离之间的第一个联系,表明如果一个反目标是个人公平,它也大致证实了准确均等,因为两个群体之间的距离是瓦塞斯坦距离所给的。我们进一步扩展了这一结果,我们通过理论性实验,我们通过真实的模型分析,形成了一种真实的模型,我们通过一种真实的模型来学习世界的回归数据。