We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.
翻译:我们开始对有差别的私人(DP)估计进行研究,并获得少量公共数据。关于对d-维高斯的私人估计,我们假设公共数据来自一个高斯人,该数据可能与私人数据的基本高斯人完全相异,完全相仿;我们表明,在纯粹或集中的DP的限制下,d+1公共数据样本足以消除对私人抽样复杂性数据分布范围参数的任何依赖,而众所周知,如果没有公共数据,私人抽样复杂性是必需的。对于分离的高斯人混合物,我们假设基础的公共和私人分布是相同的,我们考虑两种设置:(1) 如果给公共数据一个维度独立,那么私人抽样复杂性就能够在混合成分数量方面得到多元性改进,对分布范围参数的任何依赖都可以在大约的DP案例中消除;(2) 如果考虑到公共数据在维度上线性的数量,私人样本复杂性可以独立于范围参数,即使根据集中的DP,还可以对总体抽样复杂性作出进一步的改进。