Differential privacy is widely adopted to provide provable privacy guarantees in data analysis. We consider the problem of combining public and private data (and, more generally, data with heterogeneous privacy needs) for estimating aggregate statistics. We introduce a mixed estimator of the mean optimized to minimize the variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users. Similarly, we present a mixed median estimator based on the exponential mechanism. We compare our mechanisms to the methods proposed in Jorgensen et al. [2015]. Our experiments provide empirical evidence that our mechanisms often outperform the baseline methods.
翻译:为了在数据分析中提供可行的隐私保障,广泛采用了不同的隐私,我们考虑了将公共和私人数据(以及更广义地说,具有多种隐私需求的数据)合并起来估算总统计数据的问题。我们采用了一种混合的混合估计方法,优化了平均值,以尽量减少差异。我们争辩说,我们的机制优于以与用户的隐私需求成比例的子抽样数据来保护个人隐私的技术。同样,我们根据指数机制提出了混合的中位估计数据。我们比较了我们的机制与Jorgensen等人[2015] 提出的方法。我们的实验提供了经验证据,证明我们的机制往往超越了基线方法。