Differential Privacy (DP) is an important privacy-enhancing technology for private machine learning systems. It allows to measure and bound the risk associated with an individual participation in a computation. However, it was recently observed that DP learning systems may exacerbate bias and unfairness for different groups of individuals. This paper builds on these important observations and sheds light on the causes of the disparate impacts arising in the problem of differentially private empirical risk minimization. It focuses on the accuracy disparity arising among groups of individuals in two well-studied DP learning methods: output perturbation and differentially private stochastic gradient descent. The paper analyzes which data and model properties are responsible for the disproportionate impacts, why these aspects are affecting different groups disproportionately and proposes guidelines to mitigate these effects. The proposed approach is evaluated on several datasets and settings.
翻译:差异隐私(DP)是私人机器学习系统的一个重要的增进隐私技术,可以衡量和约束个人参与计算的风险,但最近观察到,差异隐私学习系统可能加剧对不同群体个人的偏向和不公平,本文件以这些重要意见为基础,并阐明了在差异私人经验风险最小化问题上产生不同影响的原因,侧重于个人群体之间在两种研究周密的DP学习方法中出现的准确性差异:产出扰动和差异性私人随机梯度梯度下降。论文分析了哪些数据和模型属性对影响过大负有责任,为什么这些方面对不同群体的影响不成比例,并提出了减轻这些影响的指导方针。在几个数据集和环境中对拟议方法进行了评估。