As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision~(\cifar and CelebA), and tabular~(Law School) datasets and learning algorithms.
翻译:由于在关键决策过程中对敏感数据采用机器学习算法,因此它们也具有私人性质和公平性就变得日益重要了。在本文中,我们表明,当数据有一个长尾结构时,不可能建立精确的学习算法,这种算法既属于私人性质,又能提高少数群体亚群人口的准确性。我们进一步表明,即使有严格的隐私要求,总体精确度的放松也可以带来良好的公平。为了证实我们在实践中的理论结果,我们利用各种合成的、视觉的(\cifar和CelibA)和表式的(法律学校)数据集和学习算法,提供了一套广泛的实验结果。