K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. A theoretical proof is given to guarantee that the multi-prototypes selected by MPS can achieve a constant factor approximation to the optimal cost of the K-Means problem. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without k being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct k. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given k first. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm.
翻译:K- Means 算法是一种广受欢迎的群集方法。 但是,它有两个限制:(1) 它很容易在假的本地迷你迷你中被卡住,(2) 组数 k 的数量必须先验。 要解决这两个问题, 将提出基于 K- Means 群集算法( MCKM ) 的多原型 矩形组合组合( MCKM ) 。 首先, 根据K- Means 问题的假的本地迷你结构, 多原型抽样( MPS) 的设计是为了为任意形状的数据选择适当数量的多原型数据。 通过将这两种技术与K- Means 算法相结合, 提供了理论证明, 以确保MPS 选定的多原型多原型能够实现与 K- Means 问题最佳成本的不变要素近似近似。 然后, 一个合并技术, 叫做 Convex 组合( CMM), 合并多原型, 以获得更好的本地微型微型微型缩算法。 拟议的MMCKMM- massal 算法在不测算出本地的模型上, 有效和可解释的K- movilalal- 。