Clustering methods group data points together and assign them group-level labels. However, it has been difficult to evaluate the confidence of the clustering results. Here, we introduce a novel method that could not only find robust clusters but also provide a confidence score for the labels of each data point. Specifically, we reformulated label-propagation clustering to model after forest fire dynamics. The method has only one parameter - a fire temperature term describing how easily one label propagates from one node to the next. Through iteratively starting label propagations through a graph, we can discover the number of clusters in a dataset with minimum prior assumptions. Further, we can validate our predictions and uncover the posterior probability distribution of the labels using Monte Carlo simulations. Lastly, our iterative method is inductive and does not need to be retrained with the arrival of new data. Here, we describe the method and provide a summary of how the method performs against common clustering benchmarks.
翻译:分组方法将数据点组合在一起, 并指定分组标签 。 但是, 很难评估组群结果的可信度 。 在此, 我们引入了一种新的方法, 不仅可以找到稳健的组群, 还可以为每个数据点的标签提供信任分 。 具体地说, 我们重塑了标签- 配置组群, 在森林火灾动态后进行模型。 方法只有一个参数 - 一个火温术语, 描述一个标签从一个节点传播到下一个节点的容易程度 。 通过一个图解反复启动标签传播, 我们可以在数据集中发现组群的数量, 并且有最低的先前假设 。 此外, 我们可以验证我们的预测, 并用蒙特卡洛 模拟来发现标签的外在概率分布 。 最后, 我们的迭代方法不易被重新训练为新数据的到达 。 在此, 我们描述该方法, 并提供一个方法是如何按照共同的群集基准运行的概要 。