We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in $\tilde{O}(n\sqrt{m})$ time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as $m=n^{2-\epsilon}$ for some constant $\epsilon > 0$. We complement this result with a simple $\epsilon$-close approximation algorithm for average-linkage in our framework that runs in $\tilde{O}(m)$ time. As an application of our algorithms, we consider clustering points in a metric space by first using $k$-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
翻译:我们研究在边缘加权图中广泛使用的等级群集算法(HAC) 。 我们定义了等级群集图形群集算法(HAC) 的算法框架( HAC) 。 对于这个变量, 这是第一个在亚赤道时间运行的等级群集图形群集算法框架, 提供第一个高效的 $\ tilde{O}(m) (m) 为某些固定的 $\ epsilon > 0$ 提供美元的时间精确算法。 我们用一个简单的 $\ epsilon$- close 缩略算法来补充这个结果, 以美元运行在 $\ tilde{O}(m) 的时间运行。 作为我们的算法应用, 我们首先考虑用 $k$- NNN 来在光量基空间运行的第一个精确算点, 也就是以美元=2\\\\\\ eepslon} $(m) 美元运行一些固定的 $eepslus commal commational commus commational assalation assalation max max 方法, 。 我们用一个可获取的模型的模型数据, 我们的模型的算算算算出一个可获取的数据, 。