Internal cluster validity measures (such as the Calinski-Harabasz, Dunn, or Davies-Bouldin indices) are frequently used for selecting the appropriate number of partitions a dataset should be split into. In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities. Is the optimal grouping with regards to, say, the Silhouette index really meaningful? It turns out that many cluster (in)validity indices promote clusterings that match expert knowledge quite poorly. We also introduce a new, well-performing variant of the Dunn index that is built upon OWA operators and the near-neighbour graph so that subspaces of higher density, regardless of their shapes, can be separated from each other better.
翻译:内部组群有效性衡量标准(如 Calinski-Harabasz、 Dunn 或 Davies-Bouldin 指数) 经常用于选择适当的分区数量 。 数据集应该拆分。 本文中我们考虑,如果我们将这类指数作为不受监督的学习活动中的客观功能来对待, 会发生什么情况。 有关Silhouette 指数的最佳组合是否真正有意义? 结果发现, 许多组群( 无效指数) 促进了与专家知识相匹配的集群。 我们还引入了基于 OWA 操作员和近邻图形的Dun 指数新的、 良好的变式, 这样高密度的子空间, 不论形状如何, 都可以更好地相互分隔 。