私营示范培训公共数据辅助镜底背景 (Public Data-Assisted Mirror Descent for Private Model Training)

We revisit the problem of using public data to improve the privacy/utility trade-offs for differentially private (DP) model training. Here, public data refers to auxiliary data sets that have no privacy concerns. We consider public data that is from the same distribution as the private training data. For convex losses, we show that a variant of Mirror Descent provides population risk guarantees which are independent of the dimension of the model ($p$). Specifically, we apply Mirror Descent with the loss generated by the public data as the mirror map, and using DP gradients of the loss generated by the private (sensitive) data. To obtain dimension independence, we require $G_Q^2 \leq p$ public data samples, where $G_Q$ is a measure of the isotropy of the loss function. We further show that our algorithm has a natural ``noise stability'' property: If around the current iterate the public loss satisfies $\alpha_v$-strong convexity in a direction $v$, then using noisy gradients instead of the exact gradients shifts our next iterate in the direction $v$ by an amount proportional to $1/\alpha_v$ (in contrast with DP-SGD, where the shift is isotropic). Analogous results in prior works had to explicitly learn the geometry using the public data in the form of preconditioner matrices. Our method is also applicable to non-convex losses, as it does not rely on convexity assumptions to ensure DP guarantees. We demonstrate the empirical efficacy of our algorithm by showing privacy/utility trade-offs on linear regression, deep learning benchmark datasets (WikiText-2, CIFAR-10, and EMNIST), and in federated learning (StackOverflow). We show that our algorithm not only significantly improves over traditional DP-SGD and DP-FedAvg, which do not have access to public data, but also improves over DP-SGD and DP-FedAvg on models that have been pre-trained with the public data to begin with.

翻译：我们重新审视了使用公共数据改善隐私/公用数据交换差异私人(DP)模式培训的问题。在这里,公共数据是指无隐私关切的辅助数据集。我们认为公共数据与私人培训数据相同。对于 convex 损失,我们显示一个“镜形源”变量提供了独立于模型维度的人口风险保障。具体地说,我们应用“镜形源”作为镜形地图,使用由公共数据引起的损失来改善隐私/公用数据交换。为了获得维度独立,我们需要“G%2\leq p$的辅助数据集,没有隐私关切。我们认为“美元”是来自与私人培训数据相同的分布数据。我们进一步显示,我们的算法具有自然的“稳定”属性:如果在目前公共损失的深度上, $alpha_venty compretailated, 以美元为方向,然后使用“热度梯度梯度”而不是将我们下一次的驱动数据转换为方向上“美元”Squreal-deal-dealalalalalal-deal dal dies 。我们的数据在“DP-deal-deal-demode”中也显示了“Dal-destreval-demodemodealtal dal drodudududududududududududude d)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/