A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with three popular internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, and Davies-Bouldin index, on a set of 145 datasets. The result shows the new index improves the three popular indices by 59\%, 34\%, and 74\%, correspondingly.
翻译:引入了一个新的集群内部评价指数。 该指数被定义为两个子指数的混合体。 第一个子指数I_a美元称为模糊的指数;第二个子指数I_s美元称为相似指数。计算两个子指数的依据是对数据分类的每个组群的密度估计。进行了一项实验,以测试新指数的性能,并与三种受欢迎的内部集群评价指数 -- -- Calinski-Harabasz指数、Silhouette系数和Davies-Bouldin指数 -- -- 以145个数据集为单位。结果显示,新的指数使三种流行指数相应地改进了59 ⁇ 、34 ⁇ 和74 ⁇ 。