We study the canonical statistical task of computing the principal component from $n$ i.i.d.~data in $d$ dimensions under $(\varepsilon,\delta)$-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: ($i$) even for Gaussian data, existing private algorithms require the number of samples $n$ to scale super-linearly with $d$, i.e., $n=\Omega(d^{3/2})$, to obtain non-trivial results while non-private PCA requires only $n=O(d)$, and ($ii$) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on {\em private mean estimation}, which adds minimal noise required to ensure privacy by adapting to the variance of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=\tilde O(d)$. Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate.
翻译:尽管在文献中进行了广泛研究,但现有解决方案在两个关键方面都存在缺陷:即使高斯数据(美元),现有的私人算法要求以美元(即美元)和美元(美元)计算主要组成部分,以获得非三维结果,而非私营五氯苯甲醚只需要美元=O(d)和(二)美元),而现有的技术则因非蒸发错误而受损,即使每个数据点的随机性是任意小的,我们提议DP-PCA,这是一个克服两种限制的单行算法,它以私人最低梯度梯度方法为基础,该方法依靠的是美元(美元)和私人平均估计值(美元),这增加了确保隐私所需的最低噪声,为此,我们调整了即使是某种小型梯度的差值,因此提供了一种更低的假设率。