We use disparate impact, i.e., the extent that the probability of observing an output depends on protected attributes such as race and gender, to measure fairness. We prove that disparate impact is upper bounded by the total variation distance between the distribution of the inputs given the protected attributes. We then use pre-processing, also known as data repair, to enforce fairness. We show that utility degradation, i.e., the extent that the success of a forecasting model changes by pre-processing the data, is upper bounded by the total variation distance between the distribution of the data before and after pre-processing. Hence, the problem of finding the optimal pre-processing regiment for enforcing fairness can be cast as minimizing total variations distance between the distribution of the data before and after pre-processing subject to a constraint on the total variation distance between the distribution of the inputs given protected attributes. This problem is a linear program that can be efficiently solved. We show that this problem is intimately related to finding the barycenter (i.e., center of mass) of two distributions when distances in the probability space are measured by total variation distance. We also investigate the effect of differential privacy on fairness using the proposed the total variation distances. We demonstrate the results using numerical experimentation with a practice dataset.
翻译:我们使用不同的影响,即观察输出的概率取决于种族和性别等受保护的属性,以衡量公平性。我们证明,不同影响因受保护属性的输入分配之间的总差异差差而具有上层界限。我们然后使用预处理,又称为数据维修,以执行公平性。我们证明,效用退化,即通过预处理数据而预测模型变化的成功程度,受预处理前和预处理后数据分配之间的总差异差差幅所限制。因此,找到最佳处理前团以推行公平性的问题可以表现为尽量减少数据在预处理前和预处理后分配之间的总差异差差幅。我们使用拟议中的距离来调查数据保密性对公平性的影响。我们用提议的距离来显示数据偏差性,我们用提议的距离来显示数据偏差性。我们用拟议的数字实验结果来调查数据偏差的利差。