As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision (CIFAR10 and CelebA), and tabular (Law School) datasets and learning algorithms.
翻译:由于在关键决策过程中对敏感数据采用机器学习算法,因此它们也具有私人性质和公平性就变得日益重要了。在本文中,我们表明,当数据有一个长尾结构时,不可能建立精确的学习算法,这种算法既属于私人性质,又能提高少数族裔亚群人口的准确性。我们进一步表明,放松总体准确性即使有严格的隐私要求,也能带来良好的公平性。为了证实我们在实践中的理论结果,我们利用各种合成、视觉(CIFAR10和CelibA)以及表格(法律学校)数据集和学习算法,提供了一套广泛的实验结果。