Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. However, methods for evaluating clusterability vary radically, making it challenging to select a suitable measure. In this paper, we perform an extensive comparison of measures of clusterability and provide guidelines that clustering users can reference to select suitable measures for their applications.
翻译:集群是一个重要的数据挖掘工具,旨在发现数据中固有的集群结构。对于大多数应用来说,仅仅在存在集群结构的情况下才适用集群。因此,评估数据是否具有这种结构的集群研究是集群分析的一个组成部分,但是,评估集群的方法大相径庭,因此难以选择适当的计量标准。在本文件中,我们广泛比较了集群措施,并提供了准则,供组合用户参考,选择适合其应用的措施。