Recent advances in machine learning have largely benefited from the massive accessible training data. However, large-scale data sharing has raised great privacy concerns. In this work, we propose a novel privacy-preserving data Generative model based on the PATE framework (G-PATE), aiming to train a scalable differentially private data generator that preserves high generated data utility. Our approach leverages generative adversarial nets to generate data, combined with private aggregation among different discriminators to ensure strong privacy guarantees. Compared to existing approaches, G-PATE significantly improves the use of privacy budgets. In particular, we train a student data generator with an ensemble of teacher discriminators and propose a novel private gradient aggregation mechanism to ensure differential privacy on all information that flows from teacher discriminators to the student generator. In addition, with random projection and gradient discretization, the proposed gradient aggregation mechanism is able to effectively deal with high-dimensional gradient vectors. Theoretically, we prove that G-PATE ensures differential privacy for the data generator. Empirically, we demonstrate the superiority of G-PATE over prior work through extensive experiments. We show that G-PATE is the first work being able to generate high-dimensional image data with high data utility under limited privacy budgets ($\epsilon \le 1$). Our code is available at https://github.com/AI-secure/G-PATE.
翻译:在这项工作中,我们提出了一个基于PATE框架(G-PATE)的新颖的隐私保护数据生成模型,目的是培训一个可扩缩的、有差异的私人数据生成器,以维护高生成数据效用。我们的方法利用基因对抗网生成数据,同时在不同歧视者中进行私人聚合,以确保强大的隐私保障。与现有方法相比,G-PATE极大地改进了隐私预算的使用。特别是,我们用教师歧视者联合培训了学生数据生成器,并提出了一个新的私人梯度汇总机制,以确保教师歧视者向学生生成器流动的所有信息有差异的隐私。此外,通过随机预测和梯度分散化,拟议的梯度汇总机制能够有效地处理高维梯度矢量的矢量。理论上,我们证明G-PATE确保了数据生成器的隐私差异化。我们通过广泛的实验展示了G-PATE在先前工作中的优越性。我们展示了G-PATE在从教师歧视者向学生生成的所有信息时的保密性创新机制。我们展示了G-PATE在高安全度下能够生成高数据的数据。