With the membership function being strictly positive, the conventional fuzzy c-means clustering method sometimes causes imbalanced influence when clusters of vastly different sizes exist. That is, an outstandingly large cluster drags to its center all the other clusters, however far they are separated. To solve this problem, we propose a hybrid fuzzy-crisp clustering algorithm based on a target function combining linear and quadratic terms of the membership function. In this algorithm, the membership of a data point to a cluster is automatically set to exactly zero if the data point is ``sufficiently'' far from the cluster center. In this paper, we present a new algorithm for hybrid fuzzy-crisp clustering along with its geometric interpretation. The algorithm is tested on twenty simulated data generated and five real-world datasets from the UCI repository and compared with conventional fuzzy and crisp clustering methods. The proposed algorithm is demonstrated to outperform the conventional methods on imbalanced datasets and can be competitive on more balanced datasets.
翻译:随着成员函数严格为正,传统的模糊C均值聚类方法有时会在存在规模迥异的簇时导致不平衡的影响。也就是说,一个非常大的群集会将所有其他群集拉向其中心,即使它们相隔很远。为了解决这个问题,我们提出了一种基于成员函数的线性和二次项的目标函数的混合模糊-清晰聚类算法。在这个算法中,如果数据点距离聚类中心“足够”远,它们对聚类的归属度将自动设置为零。在本文中,我们提出了一种新的混合模糊-清晰聚类算法以及其几何解释。该算法在二十个模拟数据集和来自UCI库的五个真实数据集上进行了测试,并与传统的模糊和清晰聚类方法进行了比较。经证实,在不平衡的数据集上,该算法表现优于传统方法,并且在更平衡的数据集上也具有竞争力。