Modern machine learning systems achieve great success when trained on large datasets. However, these datasets usually contain sensitive information (e.g. medical records, face images), leading to serious privacy concerns. Differentially private generative models (DPGMs) emerge as a solution to circumvent such privacy concerns by generating privatized sensitive data. Similar to other differentially private (DP) learners, the major challenge for DPGM is also how to achieve a subtle balance between utility and privacy. We propose DP$^2$-VAE, a novel training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via \emph{pre-training on private data}. Under the same DP constraints, DP$^2$-VAE minimizes the perturbation noise during training, and hence improves utility. DP$^2$-VAE is very flexible and easily amenable to many other VAE variants. Theoretically, we study the effect of pretraining on private data. Empirically, we conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics.
翻译:然而,这些数据集通常包含敏感信息(如医疗记录、脸部图像),引起严重的隐私问题。不同的私人基因模型(DPGMs)通过产生私营的敏感数据来规避这种隐私问题。与其他有差别的私人(DP)学习者一样,DPGM面临的主要挑战是如何在公用事业和隐私之间实现微妙的平衡。我们建议DP$2$-VAE,这是一个具有可识别的DP保障和通过对私人数据进行预先培训来改进效用的变异自动编码器(VAE)的新培训机制。在相同的DP限制下,DP$2$-VAE将培训期间的扰动噪音降到最低,从而改进了效用。DP$2$-VAE非常灵活,并且很容易适应许多其他VAE变体。从理论上讲,我们研究了预先培训私人数据的效果。我们广泛进行了图像数据集实验,以说明我们在各种隐私预算和评价标准下比基线优越性。