We study private synthetic data generation for query release, where the goal is to construct a sanitized version of a sensitive dataset, subject to differential privacy, that approximately preserves the answers to a large collection of statistical queries. We first present an algorithmic framework that unifies a long line of iterative algorithms in the literature. Under this framework, we propose two new methods. The first method, private entropy projection (PEP), can be viewed as an advanced variant of MWEM that adaptively reuses past query measurements to boost accuracy. Our second method, generative networks with the exponential mechanism (GEM), circumvents computational bottlenecks in algorithms such as MWEM and PEP by optimizing over generative models parameterized by neural networks, which capture a rich family of distributions while enabling fast gradient-based optimization. We demonstrate that PEP and GEM empirically outperform existing algorithms. Furthermore, we show that GEM nicely incorporates prior information from public data while overcoming limitations of PMW^Pub, the existing state-of-the-art method that also leverages public data.
翻译:我们研究私人合成数据生成以查询发布,目的是在有不同隐私的情况下建立敏感数据集的清洁版,以保存大量统计查询的答案。我们首先提出一个算法框架,在文献中统一一长行的迭代算法。在这个框架内,我们提出两种新方法。第一个方法,即私人加密投影(PEP),可以被视为MWEM的先进变种,该方法可适应性地再利用过去查询测量,以提高准确性。我们的第二个方法,即具有指数机制(GEM)的基因化网络,通过优化由神经网络参数测定的超基因化模型来规避诸如MWEM和PEP等算法中的计算瓶颈,这种模型捕捉到丰富的分布式,同时能够快速地梯度优化。我们证明PEP和GEM在经验上超越了现有的算法。此外,我们显示GEM在克服现有利用公共数据的最先进方法PMW ⁇ Pub的局限性的同时,将公共数据的先前信息纳入。