任意决定是不同私人培训的隐藏成本</s> (Arbitrary Decisions are a Hidden Cost of Differentially-Private Training)

Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output predicted by equally-private models depends on the randomness used in training. Thus, for a given input, the predicted output can vary drastically if a model is re-trained, even if the same training dataset is used. The predictive-multiplicity cost of DP training has not been studied, and is currently neither audited for nor communicated to model designers and stakeholders. We derive a bound on the number of re-trainings required to estimate predictive multiplicity reliably. We analyze -- both theoretically and through extensive experiments -- the predictive-multiplicity cost of three DP-ensuring algorithms: output perturbation, objective perturbation, and DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the level of privacy increases, and is unevenly distributed across individuals and demographic groups in the data. Because randomness used to ensure DP during training explains predictions for some examples, our results highlight a fundamental challenge to the justifiability of decisions supported by differentially-private models in high-stakes settings. We conclude that practitioners should audit the predictive multiplicity of their DP-ensuring algorithms before deploying them in applications of individual-level consequence.

翻译：在保护隐私的机器学习中使用的机制往往是为了在示范培训期间保证不同的隐私(DP) 。实用的DP确保培训方法在将模型参数与隐私敏感数据(例如,在斜坡梯度上增加高山噪音)相适应时使用随机化。我们证明,这种随机化会产生预测性的多重性:对于一个特定的投入来说,由同等私人模式预测的产出取决于培训中使用的随机性。因此,对于一个特定的投入来说,如果对一个模型进行再培训,即使使用同样的培训数据集,预测的产出也会有很大差异。 DP培训的预测性多重性费用尚未研究,目前既没有为模范设计者和利益攸关方进行审计,也没有向模型设计者和利益攸关方进行传达。我们对这种随机性随机性培训的数量进行了限制,以便可靠地估计预测多重性。我们从理论上和通过广泛试验分析了三种DP保证性算法的预测性成本的多重性:产出扰动性模型、目标扭曲性和DP-SGD。我们表明,随着隐私水平的提高,预测性多重性增长程度的程度,目前既没有为模型设计者和利益攸关方进行审计设计师进行审计时,我们所使用的基本预测性预测性决定的概率性分析的结果是用来解释。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日