In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.
翻译:在这项工作中,我们采用了一种有区别的私人方法,从垂直分割的数据中生成合成数据,即\emph{i.e.},同一个人的数据分布于多个数据持有者或当事方。我们采用了一种有区别的隐私梯度梯度下降算法(DP-SGD),用变式推理法对这种分离数据进行混合模型的培训。我们修改了一个安全的多功能计算框架,将MPC与差异隐私(DP)结合起来,以便有效地使用差异的私人MPC,在这种垂直分割的数据中学习DP下的一种稳妥的基因模型。假设混合成分不包含不同缔约方之间的依赖性,那么可以将客观功能纳入缔约方计算出的产品的总和中。最后,MPC用于对不同贡献的总量进行计算。此外,我们严格地界定了系统不同参与者的隐私保障。为了证明我们的方法的准确性,我们从UCI机器学习库对成人数据集进行算算法,我们在那里取得了与非分割案例的可比较结果。