We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / \epsilon^2)$, where $\epsilon$ is the privacy guarantee. (The dimension term, $d$, can be replaced with $O(\log k)$ using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime. Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.
翻译:我们通过利用树嵌入法,使算法能够高效和容易地得到执行,而算法与最先进的非私人方法相比具有经验上的竞争力。我们证明我们的方法计算出一个成本最高为O(d ⁇ 3/2 ⁇ log n)\cdot OTP + O(k d ⁇ 2\log2 n/\epsilon2)$的解决方案,其中,美元是隐私的保证。 (尺寸术语,$d$,可以用标准尺寸降低技术用美元(log k)取代。)尽管最坏的保证比艺术私人组法的状态要差,但我们提出的算法是实用的,在近线性、 $\tilde{(nkd) $、 时间和尺度上达到数以百万计的点。我们还表明,我们的方法可以在大规模分布的计算环境中平行进行。特别是,用美元(log) 美元,用标准尺寸减少技术用美元取代。 ) 尽管最坏的情况保证比艺术的私人群集法方法更差,但是我们提出的算法是实际的, 运行近线, $tildededeadlogalal assal assal assal assalial laxal lax lax