The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low variance, and low privacy loss for arbitrary distributions. On the positive side, we show that unbiased mean estimation is possible under approximate differential privacy if we assume that the distribution is symmetric. Furthermore, we show that, even if we assume that the data is sampled from a Gaussian, unbiased mean estimation is impossible under pure or concentrated differential privacy.
翻译:差别私人平均估计的粗金刚石算法是先将样本切入一个封闭范围,然后增加其实验平均值的噪音。 缩放控制了敏感度,从而控制了我们为隐私添加的噪音的差异。 但是剪切也引入了统计偏差。 我们证明这种权衡是内在的:任何算法都不能同时产生低偏差、低差异和任意分配的低隐私损失。 从积极的一面看,如果我们假设分布是对称的,那么在近似差异隐私下,不偏颇的平均值估计是可能的。 此外,我们证明,即使我们假设数据是从高斯人那里抽样的,在纯粹或集中的隐私下,不偏向的中值估计是不可能的。