We propose a general approach for differentially private synthetic data generation, that consists of three steps: (1) select a collection of low-dimensional marginals, (2) measure those marginals with a noise addition mechanism, and (3) generate synthetic data that preserves the measured marginals well. Central to this approach is Private-PGM, a post-processing method that is used to estimate a high-dimensional data distribution from noisy measurements of its marginals. We present two mechanisms, NIST-MST and MST, that are instances of this general approach. NIST-MST was the winning mechanism in the 2018 NIST differential privacy synthetic data competition, and MST is a new mechanism that can work in more general settings, while still performing comparably to NIST-MST. We believe our general approach should be of broad interest, and can be adopted in future mechanisms for synthetic data generation.
翻译:我们建议对有差别的私人合成数据生成采取一般方法,包括三个步骤:(1) 选择收集低维边际数据,(2) 以噪音添加机制衡量这些边际数据,(3) 生成能保存有测量的边际数据;这一方法的核心是私人-PGM,这是一种后处理方法,用来估计从对边际进行噪音测量得出的高维数据分布;我们提出了两种机制,即NIST-MST和MST,这是这种一般性方法的实例。 NIST-MST是2018年NIST差异性隐私合成数据竞争的获胜机制,而MST是一个新的机制,可以在更一般的环境中运作,同时仍然与NIST-MST具有可比性。 我们认为,我们的一般方法应该具有广泛的兴趣,并且可以在未来合成数据生成机制中采用。