Privacy protection with synthetic data generation often uses differentially private statistics and model parameters to quantitatively express theoretical security. However, these methods do not take into account privacy protection due to the randomness of data generation. In this paper, we theoretically evaluate R\'{e}nyi differential privacy of the randomness in data generation of a synthetic data generation method that uses the mean vector and the covariance matrix of an original dataset. Specifically, for a fixed $\alpha > 1$, we show the condition of $\varepsilon$ such that the synthetic data generation satisfies $(\alpha, \varepsilon)$-R\'{e}nyi differential privacy under a bounded neighboring condition and an unbounded neighboring condition, respectively. In particular, under the unbounded condition, when the size of the original dataset and synthetic datase is 10 million, the mechanism satisfies $(4, 0.576)$-R\'{e}nyi differential privacy. We also show that when we translate it into the traditional $(\varepsilon, \delta)$-differential privacy, the mechanism satisfies $(4.00, 10^{-10})$-differential privacy.
翻译:利用合成数据生成进行数据隐私保护通常使用差分隐私统计和模型参数来定量表达理论安全性。然而,这些方法并没有考虑到由于数据生成的随机性而产生的隐私保护。本文通过理论分析,在使用原始数据集的均值向量和协方差矩阵的合成数据生成方法中,从 $\varepsilon$ 来确定 $(\alpha, \varepsilon)$-R\'{e}nyi 差分隐私的条件,其中 $\alpha > 1$ 固定。具体而言,在受到有界邻近条件和无界邻近条件的限制下,满足 $(\alpha, \varepsilon)$-R\'{e}nyi 差分隐私时,我们展示了 $\varepsilon$ 的条件。尤其是在无界条件下,当原始数据集和合成数据集的大小均为 1000 万时,该机制满足 $(4, 0.576)$-R\'{e}nyi 差分隐私。同时,我们还证明,当将其转换为传统的 $(\varepsilon, \delta)$-差分隐私时,该机制满足 $(4.00, 10^{-10})$-差分隐私。