Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (M$^2$M) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.
翻译:不同隐私是分析敏感数据的标准隐私定义。然而,其隐私预算约束了分析员能够以合理准确性完成的任务数量,这就使得在实践中很难部署。这可以通过私人素描来缓解,因为将数据集压缩成一个单一的杂音素描矢量,可以与分析员共享并用于任意进行许多分析。然而,从素描中执行具体任务的算法必须逐案制定,这是使用这些算法的一大障碍。在本文中,我们引入了通用的瞬间到时(M$2$M)方法,以便从单一的私人素描中执行广泛的数据勘探任务。除其他外,这种方法可以用来估计属性的经验时刻、共变式矩阵、计数查询(包括直方图)和回归模型。我们的方法将素描机制视为黑箱操作,因此可以应用于各种文献的素描图,在不造成进一步工程或隐私损失的情况下扩大其应用范围,并消除一些技术障碍,以便更广泛地采用个人素描图,从而从一个单一的私人素描图中进行广泛的数据勘探,我们可以用的模型来验证。