The k-medoids algorithm is a popular variant of the k-means algorithm and widely used in pattern recognition and machine learning. A main drawback of the k-medoids algorithm is that it can be trapped in local optima. An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback, based on constructing a candidate medoids subset with a parameter choosing procedure, but it may fail when dealing with imbalanced datasets. In this paper, we propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters from 2 to k through a nonparametric and stochastic k-means++ search procedure. Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve the clustering performance, and deal with imbalanced datasets very well. But our algorithm has a weakness in computation efficiency. To address this issue, we propose a fast INCKPP algorithm (called INCKPP$_{sample}$) which preserves the computational efficiency of the simple and fast k-medoids algorithm with an improved clustering performance. The proposed algorithm is compared with three state-of-the-art algorithms: the improved k-medoids algorithm (INCKM), the simple and fast k-medoids algorithm (FKM) and the k-means++ algorithm (KPP). Extensive experiments on both synthetic and real world datasets including imbalanced datasets illustrate the effectiveness of the proposed algorithm.
翻译:k- medids 算法是 k 比例算法的流行变体, 并广泛用于模式识别和机器学习。 k- medids 算法的主要缺点是, k- midids 算法可以被困在本地的 Optima 中。 最近提出了一个改进的 k- medids 算法( INCKM ) 来克服这一退变, 其基础是构建一个带有参数选择程序的候选类比子子子子, 但处理不平衡的数据集时可能失败。 在本文中, 我们提出一个新的 k- 比例算法( INCKP$@ sample} ), 它将通过一个非参数计量和随机的 k- 比例算法 k- 比例算法搜索程序将组数从 2 增加到 k 。 我们的算法可以克服改进 k- medmoids 算法的参数选择问题, 并且处理不平衡的数据集。 但是我们的算法在计算效率上有一个弱点。 为了解决这个问题, 我们提议一个快速的 INC 算法( kKPprial- k- logyal- k- liumal- 和 kmalational- kmalationalational- salation) 3 的算法的算法, 和拟议快速的算法 和快速的算法的算法的算法将改进的算法。