Graphs are widely used for describing systems made up of many interacting components and for understanding the structure of their interactions. Various statistical models exist, which describe this structure as the result of a combination of constraints and randomness. %Model selection techniques need to automatically identify the best model, and the best set of parameters for a given graph. To do so, most authors rely on the minimum description length paradigm, and apply it to graphs by considering the entropy of probability distributions defined on graph ensembles. In this paper, we introduce edge probability sequential inference, a new approach to perform model selection, which relies on probability distributions on edge ensembles. From a theoretical point of view, we show that this methodology provides a more consistent ground for statistical inference with respect to existing techniques, due to the fact that it relies on multiple realizations of the random variable. It also provides better guarantees against overfitting, by making it possible to lower the number of parameters of the model below the number of observations. Experimentally, we illustrate the benefits of this methodology in two situations: to infer the partition of a stochastic blockmodel, and to identify the most relevant model for a given graph between the stochastic blockmodel and the configuration model.
翻译:图表被广泛用于描述由许多互动组件组成的系统,并用于理解其互动结构。存在各种统计模型,这些模型通过制约和随机性相结合来描述这一结构。% 模式选择技术需要自动确定最佳模型,以及特定图形的最佳参数组。为此,大多数作者都依赖最低描述长度范式,并将它应用到图表中,方法是考虑图形组合中定义的概率分布的酶值。在本文中,我们引入边缘概率顺序推论,这是执行模型选择的新方法,该方法依赖边缘组合的概率分布。从理论角度看,我们表明这一方法为现有技术的统计推断提供了更加一致的基础,因为它依赖于对随机变量的多重认识。它还提供了更好的保障,防止过分调整,使模型参数数量低于观测数量。我们实验性地展示了这一方法在两种情况下的效益:在给定的模型区块模型和区块模型之间定位一个最相关的模型。