Hierarchical and k-medoids clustering are deterministic clustering algorithms based on pairwise distances. Using these same pairwise distances, we propose a novel stochastic clustering method based on random partition distributions. We call our method CaviarPD, for cluster analysis via random partition distributions. CaviarPD first samples clusterings from a random partition distribution and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. We compare CaviarPD with hierarchical and k-medoids clustering through eight case studies. Cluster estimates based on our method are competitive with those of hierarchical and k-medoids clustering. They also do not require the subjective choice of the linkage method necessary for hierarchical clustering. Furthermore, our distribution-based procedure provides an intuitive graphical representation to assess clustering uncertainty.
翻译:等级和 k- medoids 群集群集是基于对称距离的确定式群集算法。 我们用这些相同的对称距离提出一种基于随机分区分布的新颖的随机随机群集集法。 我们称我们的方法CaviarPD, 用于通过随机分区分布进行群集分析。 CavaarPD 首先是随机分区分布的样本群集,然后找到基于这些样本的最佳群集估计, 使用算法来尽量减少预期损失。 我们通过八个案例研究将CavavarPD与等级和 k- meds 群集相比较。 基于我们的方法的群集估计与等级和 k-meds 群集群集的群集相比具有竞争力。 它们也不要求对等级分组所需的联系方法进行主观选择。 此外, 我们基于分布的程序提供了一种直观的图形代表来评估群集不确定性。