A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set. Designed in nonparametric approach, it is independent of the distribution of given data. Interpoint distances between the data members make our cluster validity index applicable to univariate and multivariate data measured on arbitrary scales, or having observations in any dimensional space where the number of study variables can be even larger than the sample size. Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters or to assess the quality of the resulting clusters for a data set. Demonstration through synthetic and real-life data establishes its superiority over the well-known clustering accuracy measures of the literature.
翻译:提议一项新的跨点远程测量,以确定数据集中所含群集的最佳数量。用非参数方法设计,它独立于特定数据的分布。数据成员之间的跨点距离使我们的群集有效性指数适用于任意尺度测量的单象值和多变值数据,或者在研究变量数量甚至大于抽样规模的任何维空间中进行观测。我们提出的标准与任何群集算法兼容,可用于确定未知的群集数量或评估由此得出的一组数据集的质量。通过合成和真实寿命数据进行演示,确定其优于文献中众所周知的群集准确度测量。