Differential privacy schemes have been widely adopted in recent years to address issues of data privacy protection. We propose a new Gaussian scheme combining with another data protection technique, called random orthogonal matrix masking, to achieve $(\varepsilon, \delta)$-differential privacy (DP) more efficiently. We prove that the additional matrix masking significantly reduces the rate of noise variance required in the Gaussian scheme to achieve $(\varepsilon, \delta)-$DP in big data setting. Specifically, when $\varepsilon \to 0$, $\delta \to 0$, and the sample size $n$ exceeds the number $p$ of attributes by $(n-p)=O(ln(1/\delta))$, the required additive noise variance to achieve $(\varepsilon, \delta)$-DP is reduced from $O(ln(1/\delta)/\varepsilon^2)$ to $O(1/\varepsilon)$. With much less noise added, the resulting differential privacy protected pseudo data sets allow much more accurate inferences, thus can significantly improve the scope of application for differential privacy.
翻译:差分隐私方案近年来被广泛采用来解决数据隐私保护问题。我们提出了一种新的高斯方案,并结合了随机正交矩阵蒙版这种数据保护技术,以更高效的方式实现 $(\varepsilon, \delta)$-差分隐私保护。我们证明了额外的矩阵蒙版显著降低了在大数据设置中高斯方案需要的噪声方差,以实现 $(\varepsilon, \delta)$-差分隐私保护。具体而言,在 $\varepsilon \to 0$,$\delta \to 0$ 且样本大小 $n$ 超过属性数 $p$ 的数量 $(n-p)=O(ln(1/\delta))$ 时,为实现 $(\varepsilon, \delta)$-差分隐私,所需加性噪声方差从 $O(ln(1/\delta)/\varepsilon^2)$ 减少到 $O(1/\varepsilon)$。由于添加的噪声更少,导致的差分隐私保护伪数据集能够更准确地进行推断,因此可以极大地扩展差分隐私的应用范围。