The paper tackles the problem of clustering multiple networks, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of groups of networks requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the graphs. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a collection of ecological networks illustrates the interpretability of the obtained results.
翻译:本文解决了将多个网络分组的问题,这些网络并不共享同一组的脊椎,而是将多个网络分组为具有类似地形学的网络。提出了基于有限组合的随机区块模型的统计模型方法。通过尽量扩大综合分类可能性标准获得了集群。这是从单吨区块组和相继合并的网络群组开始的等级组合式算法做的。因此,计算嵌套组序列时,可以提供对网络集成的宝贵洞察力。使用巴伊西亚框架,自算法达到最佳组群数时停止算法后,以自动方式进行模型选择。算法是计算效率高的,在仔细实施时,算法是计算效率高的。网络群组群群集需要一种手段,以克服随机区块模型的标签操纵问题,并匹配图形的区块标签。为了解决这一问题,在比较相关区块模型的图解图的基础上,提出了一个新的工具。在合成数据上评估了组合法方法。在收集生态网络结果时应用了生态网络的可判性。