Joint distribution estimation of a dataset under differential privacy is a fundamental problem for many privacy-focused applications, such as query answering, machine learning tasks and synthetic data generation. In this work, we examine the joint distribution estimation problem given two data points: 1) differentially private answers of a workload computed over private data and 2) a prior empirical distribution from a public dataset. Our goal is to find a new distribution such that estimating the workload using this distribution is as accurate as the differentially private answer, and the relative entropy, or KL divergence, of this distribution is minimized with respect to the prior distribution. We propose an approach based on iterative optimization for solving this problem. An application of our solution won second place in the NIST 2020 Differential Privacy Temporal Map Challenge, Sprint 2.
翻译:在不同隐私下对数据集的联合分配估计是许多以隐私为重点的应用,如答题、机器学习任务和合成数据生成等,一个根本问题。在这项工作中,我们研究了联合分配估计问题,给出了两个数据点:(1) 私人对计算私人数据工作量的不同答复,(2) 公共数据集的先前经验分配。我们的目标是找到一种新的分配方法,以便用这种分配方法估计工作量的准确性与私人不同答复和相对的输入率或KL差异性一样,这种分配方法在先前的分配中被最小化。我们提议了一种基于迭接优化的方法来解决这一问题。我们的解决办法的应用在NIST 2020 差异隐私时间图挑战中赢得了第二位,Sprint 2。