Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit $O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2.5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.
翻译:等级分组是一种流行的不受监督的机器学习方法,有数十年的历史和无数的应用。 我们开始研究在(Dasgupta,2016年)引入的严格框架下,对等级分组采用不同的私人近似算法(Dasgupta,2016年),我们对此问题表现出强烈的下限:任何$\epsilon$-DP 算法都必须显示输入数据集$V$( ⁇ V ⁇ 2/\epsilon) - 额外错误。 然后,我们展示了一种多元时近似算法($O( ⁇ V ⁇ 2.5}/\epsilon)$(addiptive)错误,以及一种符合下限的指数- 时间算法。为了克服下限,我们注重于一个更低的区块模型,一种流行的图形模型,并在对区块进行分离假设后,提出一个也精确恢复区块的私人1+(1)美元近似算法。 最后,我们对我们的算法进行实验性研究并验证其性。