Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen, and assumptions made, especially with small sample size or small cluster sizes. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combine cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use a combination of clustering internal validation criteria as a novel approximation of the posterior model probability for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. This method is implemented in an accompanying R package. We explore the performance of this approach through a case study that aims to to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. We also use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separations between subgroups, and with varying numbers of clusters between models.
翻译:开发了各种方法,将多种不同结果的推论结合起来,以便在组合和协商一致组合文献中,用于未加监督的分组,在组合和协商一致的分组文献中,将多种结果组合的推论结合起来; 若干候选群集模型的“最佳”模型报告结果方法一般忽略了模式选择产生的不确定性,并导致对特定模型和选定的参数和假设,特别是小样本大小或小集群大小的假设十分敏感; 贝叶斯模型平均(BMA)是一种受欢迎的方法,将多种模型的结果组合在一起,在这种模式中带来一些有吸引力的效益,包括对组合组合结构的概率解释和基于模型的不确定性的量化; 在这项工作中,我们引入了集束型群模型,这种方法使加权模型在多个未受监督的组合算法中平均得出结果。 我们使用组合内部验证标准,作为对每个模型结果加权的假设模型概率的新的近似比值。 我们采用对简单矩阵系数来计算基于模型组合的组合组合结构结构,对基于模型的精确度分配进行最后的精确度分析。