To address the bias exhibited by machine learning models, fairness criteria impose statistical constraints for ensuring equal treatment to all demographic groups, but typically at a cost to model performance. Understanding this tradeoff, therefore, underlies the design of fair and effective algorithms. This paper completes the characterization of the inherent tradeoff of demographic parity on classification problems in the most general multigroup, multiclass, and noisy setting. Specifically, we show that the minimum error rate is given by the optimal value of a Wasserstein-barycenter problem. More practically, this reformulation leads to a simple procedure for post-processing any pre-trained predictors to satisfy demographic parity in the general setting, which, in particular, yields the optimal fair classifier when applied to the Bayes predictor. We provide suboptimality and finite sample analyses for our procedure, and demonstrate precise control of the tradeoff of error rate for fairness on real-world datasets provided sufficient data.
翻译:为解决机器学习模式所表现出的偏见,公平标准为确保所有人口群体获得平等待遇规定了统计限制,但通常以模拟业绩为代价。因此,理解这一权衡是设计公平有效的算法的基础。本文件完成了在最一般的多组、多类和吵闹的环境中对分类问题人口均等的内在权衡的定性。具体地说,我们表明,最低误差率是由瓦塞斯坦-巴里中心问题的最佳价值给出的。更实际地说,这种重订导致一种简单的程序,使任何经过预先训练的预测器在一般环境下能够满足人口均等,这在应用到拜斯预测器时尤其能产生最佳的公平分类器。我们为我们的程序提供了次优性和有限的抽样分析,并展示了对误差率的精确控制,以公平对待真实世界数据集提供了充分的数据。