The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multi-level diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this paper proposes a novel multi-diversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can thereby be constructed. Further, an entropy-based criterion is utilized to explore the cluster-wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state-of-the-art. The source code is available at https://github.com/huangdonghere/MDEC.
翻译:不同领域的高维数据迅速出现,给当前整体群集研究带来了新的挑战。为了应对维度的诅咒,最近通过不同次空基技术在混合群集方面做了大量努力。然而,除了强调子空间外,对类似/不同度度度测量中潜在多样性的注意有限。在混合集成中,如何创造和汇总大量多样化的量度,以及如何在统一框架内联合调查大量指标、子空间和组群的多层次多样性方面,仍然是一个令人惊讶的开放问题。为了解决这一问题,本文件提出了一个新的多维化的混合群集方法。然而,除了强调子空间,我们对类似度/不同度度度度量度度测量中的潜在多样性重视有限。在混合集成中,如何创建和汇集大量多样化的多维度数据群集中,基于这些多维度子空间组群集/多维度数据组群集的表达式矩阵,一个多样化基础组群集/组群集基础群集/组群集组群集组群集的组合,可以由此构建一个新的多维系基组群集的基团群集/组群集式的基组群集,其中的模型可以用来构建三个数据群集式的基团群集,其中的基团群集的模型,其中的模型可以用来构建。