The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multi-level diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this paper proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can thereby be constructed. Further, an entropy-based criterion is utilized to explore the cluster-wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state-of-the-art. The source code is available at https://github.com/huangdonghere/MDEC.
翻译:不同领域的高维数据迅速出现,给当前整体群集研究带来了新的挑战。为了应对多元性的诅咒,最近通过不同次空基技术在混合组群方面做了大量努力。然而,除了强调子空间外,对类似/不同度度度度中潜在多样性的关注有限。在混合组群中,如何创造和汇总大量多样化的量度,以及如何在统一框架内联合调查大量计量、子空间和组群中多层次多样性的问题,仍然令人惊讶地是一个开放的问题。为了解决这一问题,本文件提出了新的多维化混合组群集方法。特别是,除了强调子空间之外,我们对于在相近/不同度度度度度度度度度度度测量中的潜在多样性关注有限。在混合组群集中,如何创建和汇集大量多样化的量度量度量度测量群集,以及如何在统一的框架内共同调查多层次组群集/组群集组群群群群中共同调查多种基础组群集的多样性。此外,可以以此来构建一个多层次基组群集/级的基团群集的基团群集/级算模型。 将一个特定的基团群集型群集/级的模型的模型纳入了我们的三个类集的基团群集体的模型,其中的基团的基团群集/代号。