Making evidence based decisions requires data. However for real-world applications, the privacy of data is critical. Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data. To this end, prior works utilize differentially private data release mechanisms to provide formal privacy guarantees. However, such mechanisms have unacceptable privacy vs. utility trade-offs. We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off. We theoretically prove that generative models trained with additional causal knowledge provide stronger differential privacy guarantees. Empirically, we evaluate our solution comparing different models based on variational auto-encoders (VAEs), and show that causal information improves resilience to membership inference, with improvements in downstream utility.
翻译:然而,对于现实世界应用而言,数据隐私至关重要。使用反映原始数据某些统计特性的合成数据可以保护原始数据的隐私。为此,先前的工程使用差异化的私人数据发布机制提供正式的隐私保障。然而,这种机制有无法接受的隐私与公用权衡。我们提议将因果信息纳入培训过程,以便有利地修改上述权衡。我们理论上证明,经过额外因果知识培训的基因化模型提供了更强有力的差异性隐私保障。我们很生动地评估我们的解决办法,比较基于变式自动编码(VAEs)的不同模型,并表明因果信息提高了对归属推断的复原力,提高了下游效用。