In many statistical problems, incorporating priors can significantly improve performance. However, the use of prior knowledge in differentially private query release has remained underexplored, despite such priors commonly being available in the form of public datasets, such as previous US Census releases. With the goal of releasing statistics about a private dataset, we present PMW^Pub, which -- unlike existing baselines -- leverages public data drawn from a related distribution as prior information. We provide a theoretical analysis and an empirical evaluation on the American Community Survey (ACS) and ADULT datasets, which shows that our method outperforms state-of-the-art methods. Furthermore, PMW^Pub scales well to high-dimensional data domains, where running many existing methods would be computationally infeasible.
翻译:在许多统计问题中,将先期数据纳入前期数据可以大大改善业绩。然而,尽管以前通常以公共数据集的形式提供,例如美国以前的人口普查发布,但先前在不同私人查询发布方面的知识的使用仍然没有得到充分利用。为了公布关于私人数据集的统计数据,我们介绍了PMW ⁇ Pub, 与现有的基线不同,它利用从相关分发中获取的公共数据作为先前的信息。我们提供了关于美国社区调查(ACS)和ADUT数据集的理论分析和实证评估,这表明我们的方法优于最新方法。此外,PMW ⁇ Pub尺度也很好地适用于高维数据领域,而许多现有方法的运行在计算上是行不通的。