A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with three popular internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, and Davies-Bouldin index, on a set of 145 datasets. The result shows the new index improves the three popular indices by 59%, 34%, and 74%, correspondingly.
翻译:引入了一个新的集群内部评价指数。 该指数被定义为两个子指数的混合体。 第一个子指数 I_a 美元称为模糊的指数; 第二个子指数 I_s 美元称为相似指数。 两个子指数的计算基于对数据分类的每个组群的密度估计。 进行了一项实验以测试新指数的性能,并与三种流行的内部集群评价指数 -- -- Calinski-Harabasz 指数、Silhouette 系数和Davies-Bouldin 指数 -- -- 在一套145个数据集中进行比较。结果显示,新的指数使三种流行指数相应地提高了59%、34%和74%。