The diameter $k$-clustering problem is the problem of partitioning a finite subset of $\mathbb{R}^d$ into $k$ subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of $k$) is the agglomerative clustering algorithm with the complete linkage strategy. For decades, this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension $d$ is a constant, we show that for any $k$ the solution computed by this algorithm is an $O(\log k)$-approximation to the diameter $k$-clustering problem. Our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm. Furthermore, we analyze the closely related $k$-center and discrete $k$-center problem. For the corresponding agglomerative algorithms, we deduce an approximation factor of $O(\log k)$ as well.
翻译:直径 $k$ 集群问题在于将一个限定的子集 $mathbb{R ⁇ d$ 分割成 $k$ 子集的问题,这个子集被称为 数组,这样可以最大限度地缩小组群的最大直径。一个早期群集算算法,计算出这一问题的大致解决办法(所有值为$k$)的等级,是具有完整联系战略的聚集算法。几十年来,这种算法被从业人员广泛使用。然而,这个算法在理论上没有很好地加以研究。在本文中,我们分析聚合集成完整链接群群集算法。假设维度 $d是一个常数,我们显示,对于任何以美元计算的方块,这个算法所计算的解决办法是 $(\ log k) 与直径 $k$- 聚集问题相近。我们的分析不仅维持着Euclidean 的距离, 而且对于任何基于规范的测量度。此外,我们分析与 $- k- center 和 likee- $- center enter 问题密切相关的 。对于相应的凝聚算算算法, 我们推算出一个 $- klog 。