Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the ability of a particular adversary who seeks to maximize parity. Unfortunately, representations produced by adversarial approaches may still retain biases as their efficacy is tied to the complexity of the adversary used during training. To this end, we theoretically establish that by limiting the mutual information between representations and protected attributes, we can assuredly control the parity of any downstream classifier. We demonstrate an effective method for controlling parity through mutual information based on contrastive information estimators and show that they outperform approaches that rely on variational bounds based on complex generative models. We test our approach on UCI Adult and Heritage Health datasets and demonstrate that our approach provides more informative representations across a range of desired parity thresholds while providing strong theoretical guarantees on the parity of any downstream algorithm.
翻译:控制培训数据集中的偏差对于确保下游应用中不同群体之间的平等待遇或均等至关重要。一个天真的解决办法是改变数据,使其在统计上独立于群体成员,但当需要公平性和准确性之间的合理妥协时,可能会丢弃过多的信息。另一个共同的办法是限制寻求最大程度均等的特定对手的能力。不幸的是,对立办法产生的表述可能仍然保留偏差,因为其效力与培训中使用的对手的复杂性相关联。为此,我们理论上确定,通过限制代表与受保护属性之间的相互信息,我们就能确保控制任何下游分类师的均等。我们展示了一种有效的方法,通过基于对比性信息估量器的相互信息来控制等同。我们展示了一种有效的方法,即通过基于复杂基因模型的变异界限来控制等同。我们测试了我们关于UCI成人和遗产健康数据集的方法,并表明我们的方法在一系列理想的均等阈值上提供了更丰富的信息性陈述,同时对任何下游算法的等同性提供了强有力的理论保证。