It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined. Moreover, a general method for determining the optimal number of clusters based on HCVI is proposed. The proposed methods can evaluate the clustering results produced by the several classic methods and determine the optimal cluster number for data sets containing noises and clusters with arbitrary shapes. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
翻译:评估质量和确定集群分析中组群的最佳数目至关重要。在本文件中,数据集的多角度定性是为了获得超球。基于超球的群集内部评价指数(HCVI)已经确定。此外,还提出了基于HCVI确定最佳组群数量的一般方法。拟议方法可以评价若干经典方法产生的群集结果,并确定含有噪声和任意形状的群集数据集的最佳群集数目。合成和真实数据集的实验结果表明,新的指数优于现有指数。